Dynamic Trade-Offs in Adversarial Training: Exploring Efficiency, Robustness, Forgetting, and Interpretability

Authors
E. Kafali
T. Semertzidis
P. Daras
Year
2025
Venue
Neural Processing Letters
Download

Abstract

Adversarial attacks pose a threat to neural networks, requiring robust methods to mitigate them. Adversarial Training has emerged as a promising approach; however, its practical appli- cation in real-world deep learning systems is hindered by the trade-offs between efficiency and robustness, as optimizing for one aspect may come at cost of the other. This paper presents a comprehensive investigation into the impact of different Adversarial Training approaches and model types on the robustness of adversarially trained models, while considering the dynamic trade-offs involved. Leveraging our previously published method, Delayed Adver- sarial Training with Non-Sequential Adversarial Epochs – DATNS, we conduct extended empirical analyses through new experiments to effectively balance these trade-offs and nav- igate the interplay between efficiency and robustness, as well as catastrophic forgetting and interpretability. By providing our insights on the discussed trade-offs this research aims to enable the development of more efficient, robust, and interpretable models against adversarial attacks.