Recent technological advances in the fields of data and computer science have improved significantly the everyday life of people. However, technological advances are also being adopted by criminals to facilitate and expand their illicit actions. The Deep Learning (DL) paradigm has shown a significant potential in analysing complex structured data. However, in the crime detection domain, a limited number of public datasets is available, constrained to specific tasks only, which hinders the research and development of accurate and robust DL-assisted tools. The goal of this work is to extend the well-known UCF-crime dataset to the case of video captioning. To the best of our knowledge, this is the first publicly available crime-related video captioning dataset. A new proposed video captioning approach is compared to a plethora of state-of- the-art-methods in this dataset, while qualitative and quantitative characteristics of the latter are presented.