A Deep Learning Approach to Object Affordance Segmentation


Learning to understand and infer object functionalities is an important step towards robust visual intelligence. Significant research efforts have recently focused on segmenting the object parts that enable specific types of human-object interaction, the so-called “object affordances”. However, most works treat it as a static semantic segmentation problem, focusing solely on object appearance and relying on strong supervision and object detection. In this paper, we propose a novel approach that exploits the spatio-temporal nature of human-object interaction for affordance segmentation. In particular, we design an autoencoder that is trained using groundtruth labels of only the last frame of the sequence, and is able to infer pixel-wise affordance labels in both videos and static images. Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism that enables the implicit localization of the interaction hotspot. For evaluation purposes, we introduce the SOR3D-AFF corpus, which consists of human-object interaction sequences and supports 9 types of affordances in terms of pixel-wise annotation, covering typical manipulations of tool-like objects. We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF, while being able to predict affordances for similar unseen objects in two affordance image-only datasets.

  • S. Thermos, P. Daras, G. Potamianos, "A Deep Learning Approach to Object Affordance Segmentation", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2358-2362, Barcelona, Spain, May 4-8, 2020. DOI: 10.1109/ICASSP40776.2020.9054167

  • Full document available here.
    Contact Information

    Dr. Petros Daras, Research Director
    6th km Charilaou – Thermi Rd, 57001, Thessaloniki, Greece
    P.O.Box: 60361
    Tel.: +30 2310 464160 (ext. 156)
    Fax: +30 2310 464164
    Email: daras(at)iti(dot)gr