The Influence Of Image Descriptors’ Dimensions Value Cardinalities To Large-Scale Similarity Search

T. Semertzidis
D. Rafailidis
M. G. Strintzis
P. Daras
SPRINGER International Journal of Multimedia Information Retrieval, pp. 1-18, issn: 2192-6611,


In this empirical study, we evaluate the impact of the Dimensions' Value Cardinality (DVC) of image descriptors in each dimension, on the performance of large-scale similarity search. DVCs are inherent characteristics of image descriptors defined for each dimension as the number of distinct values of image descriptors, thus expressing the dimension's discriminative power. In our experiments, with six publicly available datasets of image descriptors of different dimensionality (64-5,000 dim) and size (240K-1M), (a) we show that DVC varies, due to the existence of several extraction methods using different quantization and normalization techniques; (b) we also show that image descriptor extraction strategies tend to follow the same DVC distribution function family, therefore similarity search strategies can exploit image descriptors DVCs, irrespective of the sizes of the datasets; (c) based on a Canonical Correlation Analysis (CCA), we demonstrate that there is a significant impact of image descriptors' DVCs on the performance of the baseline LSH method [ 7 ] and three state-of-the-art hashing methods: SKLSH [ 26 ], PCA-ITQ [ 9 ], SPH [ 12 ], as well as on the performance of MSIDX method [ 32 ], which exploits the DVC information; (d) we experimentally demonstrate the influence of DVCs in both the sequential search and in the aforementioned similarity search methods and discuss the advantages of our findings. We hope that our work will motivate researchers for considering DVC analysis as a tool for the design of similarity search strategies in image databases.