Robustness analysis of a multi-component phishing detection model against explainable AI-guided adversarial attacks

Denis, Liza; Meurant, Charline

Files

Denis_30902000_Meurant_51742000_2025.pdf

Open access
Adobe PDF
6.48 MB

Download

Details

Supervisors: Riviere, Etienne ; Bertrand Van Ouytsel, Charles-Henry ; Deprez, Emilie
Faculty: Ecole polytechnique de Louvain
Degree label: Master [120] : ingénieur civil en informatique, à finalité spécialisée
Master [120] : ingénieur civil en informatique, à finalité spécialisée
Abstract: enPhishing remains one of the most persistent threats in cybersecurity, and while numerous detection techniques have been developed, machine learning-based approaches now dominate over earlier rules like heuristic-based methods. In this work, we investigate the robustness of a machine learning (ML) model trained on multiple components of phishing emails, when exposed to hand-crafted, explainable AI-guided, model-targeted adversarial attacks. To conduct this analysis, we reproduced and adapted an existing multi-component ML model that processes the structure, text, and URLs of emails separately and combines the results for final classification. The final system achieves a high Area Under the Precision-Recall Curve (AUPRC), reaching 0.9856. Based on model analysis driven by explainable AI (XAI) techniques, we crafted targeted attacks. These achieved a 34% success rate against the component handling the email structure, and up to 100% success rates against the components handling the email text and embedded URLs. The experiments were conducted on a dataset composed mainly of publicly available phishing emails, supplemented with some private data. The results highlight that high-performing models remain vulnerable to adversarial attacks, underscoring the need to balance interpretability with security and to maintain user awareness campaigns as a key layer of defense.