A challenging research issue, which has recently attracted a lot of attention, is the incorporation of emotion recognition technology in serious games applications, in order to improve the quality of interaction and enhance the gaming experience. To this end, in this paper, we present an emotion recognition methodology that utilizes information extracted from multimodal fusion analysis to identify the affective state of players during gameplay scenarios. More specifically, two monomodal classifiers have been designed for extracting affective state information based on facial expression and body motion analysis. For the combination of different modalities a deep model is proposed that is able to make a decision about player’s affective state, while also being robust in the absence of one information cue. In order to evaluate the performance of our methodology, a bimodal database was created using Microsoft’s Kinect sensor, containing feature vectors extracted from users' facial expressions and body gestures. The proposed method achieved higher recognition rate in comparison with mono-modal, as well as early-fusion algorithms. Our methodology outperforms all other classifiers, achieving an overall recognition rate of 98.3%.