Extracting discriminative image features for similarity search in nowadays large-scale databases becomes an imperative issue of paramount importance. To address the so called task of Approximate Nearest Neighbor (ANN) search in large visual dataset, deep hashing methods (i.e. approaches that make use of the recent deep learning paradigm in computer vision) have recently been introduced. In this paper, a novel approach to deep hashing is proposed, which incorporates local-level information, in the form of image semantic segmentation masks, during the hash code learning step. The proposed framework makes use of pixel-level classification labels, i.e. following a point-wise supervised learning methodology. Experimental evaluation in the significantly challenging domain of on-line terrorist propaganda video analysis, i.e. a highly diverse and heterogeneous application case, demonstrates the efficiency of the proposed approach.