|in TREC Video Retrieval Evaluation, 2021.|
In this report we present the submission of the VCL-CERTH team in the Trecvid 2021 Disaster Scene Description and Indexing (DSDI) task. The dataset provided for this task, LADI, contains only labels as ground truth data indicating the presence or absence of each of the 32 features of interest in the images of the dataset. However, aerial images are often captured from high altitude and as such the features that the systems participating in the task are asked to detect, often appear tiny in an image. That being the case, we believe that just a label indicating the presence of the feature as ground truth is not sufficient to guide the system to detect this feature. For this reason we opted to approach the task as a panoptic segmentation one for the vast majority of the 32 features. Since a panoptic segmentation network can not be trained on image labels we had to manually create segmentation annotations for a small part of the LADI images ourselves and train a panoptic segmentation networks using these annotations.