Recent advances in media capture and processing technologies have enabled new forms of true 3D media content that increase the degree of user immersion. The demand for more engaging forms of entertainment means that content distributors and broadcasters need to fine-tune their delivery mechanisms over the Internet as well as develop new models for quantifying and predicting user experience of these new forms of content. In the work described in this paper, we undertake one of the first studies into the QoE of real-time 3D media content streamed to VR headsets for entertainment purposes, in the context of game spectating. Our focus is on tele-immersive media that embed real users within virtual environments of interactive games. A key feature of engaging and realistic experiences in full 3D media environments, is allowing users unrestricted viewpoints. However, this comes at the cost of increased network bandwidth and the need of limiting network effects in order to transmit a realistic, real-time representation of the participants. The visual quality of 3D media is affected by geometry and texture parameters while the temporal aspects of smooth movement and synchronization are affected by lag introduced by network transmission effects. In this study, we investigate varying network conditions for a set of tele-immersive media sessions produced in a range of visual quality levels. Further, we investigate user navigation issues that inhibit free viewpoint VR spectating of live 3D media. After reporting on a study with multiple users we analyze the results and assess the overall QoE with respect to a range of visual quality and latency parameters. We propose a neural network QoE prediction model for 3D media, constructed from a combination of visual and network parameters.