In this paper, a novel framework for rich-media object retrieval is described. The searchable items are media representations consisting of multiple modalities, such as 2-D images, 3-D objects and audio files, which share a common semantic concept. The proposed method utilizes the low-level descriptors of each separate modality to construct a new low-dimensional feature space, where all media objects can be mapped irrespective of their constituting modalities. While most of the existing state-of-the-art approaches support queries of one single modality at a time, the proposed one allows querying with multiple modalities simultaneously, through efficient multimodal query formulation, and retrieves multimodal results of any available type. Finally, a multimedia indexing scheme is adopted to tackle the problem of large scale media retrieval. The present framework proposes significant advances over existing methods and can be easily extended to involve as many heterogeneous modalities as possible. Experiments performed on two multimodal datasets demonstrate the effectiveness of the proposed method in multimodal search and retrieval.