In this paper, we present a data-driven approach for challenge 1 of the MediaEval 2013 Social Event Detection Task. Our proposed approach consists of the following steps: (a) initialization based on the images’ spatio-temporal information; (b) computation of clusters’ intercorrelations; and (c) the final clusters’ generation. In the initialization step, the images that have both geolocation and time information are clustered analogously, where few “anchored” clusters are generated, while the rest of images with no geolocation or time information are considered as singleton (one image) clusters. In the second step, all pairwise intercorrelations between the “anchored” and the singleton clusters are calculated with the help of an aggregated similarity measure based on the user, title, description tag, and visual information of images. In the final step, the “anchored” and singleton clusters derived by the initialization step are merged based on the calculated intercorrelations of the second step to generate the final clusters. Our best run achieves a score of 0.5701, 0.8739 and 0.5592 for F1-Measure, NMI and Divergence (F1), respectively.