Real-world CCTV footage often poses increased challenges in object tracking due to Pan-Tilt-Zoom operations, low camera quality and diverse working environments. Most relevant challenges are movingbackground, motion blur and severe scale changes. Convolutional neural networks, which offer state-of-the-art performance in object detection, are increasingly utilized to pursue a more efficient tracking scheme. In this work, the use of heterogeneous training data and data augmentation is explored to improve their detection rate in challenging CCTV scenes. Moreover, it is proposed to use the objects’ spatial transformation parameters to automatically model and predict the evolution of intrinsic camera parameters and accordingly tune the detector for better performance. The proposed approaches are tested on publicly available datasets and real-world CCTV videos.