Robustness analysis of a multi-component phishing detection model against explainable AI-guided adversarial attacks
Files
Denis_30902000_Meurant_51742000_2025.pdf
Open access - Adobe PDF
- 6.48 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Phishing remains one of the most persistent threats in cybersecurity, and while numerous detection techniques have been developed, machine learning-based approaches now dominate over earlier rules like heuristic-based methods. In this work, we investigate the robustness of a machine learning (ML) model trained on multiple components of phishing emails, when exposed to hand-crafted, explainable AI-guided, model-targeted adversarial attacks. To conduct this analysis, we reproduced and adapted an existing multi-component ML model that processes the structure, text, and URLs of emails separately and combines the results for final classification. The final system achieves a high Area Under the Precision-Recall Curve (AUPRC), reaching 0.9856. Based on model analysis driven by explainable AI (XAI) techniques, we crafted targeted attacks. These achieved a 34% success rate against the component handling the email structure, and up to 100% success rates against the components handling the email text and embedded URLs. The experiments were conducted on a dataset composed mainly of publicly available phishing emails, supplemented with some private data. The results highlight that high-performing models remain vulnerable to adversarial attacks, underscoring the need to balance interpretability with security and to maintain user awareness campaigns as a key layer of defense.