Concept Basis Extraction for Latent Space Interpretation of Image Classifiers

Authors
A. Doumanoglou
D. Zarpalas
K. Driessens
Year
2024
Venue
Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP

Abstract

Previous research has shown that, to a large-extend, deep feature representations of image-patches that belong to the same semantic concept, lie in the same direction of an image classifier’s feature space. Conventional approaches compute these directions using annotated data, forming an interpretable feature space basis (also referred as concept basis). Unsupervised Interpretable Basis Extraction (UIBE) was recently proposed as a novel method that can suggest an interpretable basis without annotations. In this work, we show that the addition of a classification loss term to the unsupervised basis search, can lead to bases suggestions that align even more with interpretable concepts. This loss term enforces the basis vectors to point towards directions that maximally influence the classifier’s predictions, exploiting concept knowledge encoded by the network. We evaluate our work by deriving a concept basis for three popular convolutional networks, trained on three different datasets. Experiments show that our contributions enhance the interpretability of the learned bases, according to the interpretability metrics, by up-to +45.8% relative improvement. As additional practical contribution, we report hyper-parameters, found by hyper-parameter search in controlled benchmarks, that can serve as a starting point for applications of the proposed method in real-world scenarios that lack annotations.