Abstract:
Deep learning-based codecs for lossy image compression have recently managed to surpass traditional codecs like JPEG and JPEG 2000 in terms of rate-distortion tradeoff. However, they generally utilize architectures with large numbers of stacked layers, often making their inference execution prohibitively slow for time-sensitive applications. In this work, we assess the suitability of such compression techniques in real-time video streaming, and, more specifically, next-generation interactive tele-presence applications, which impose stringent latency requirements. To that end, we compare a recently published work on image compression based on convolutional neural networks which achieves state-of-the-art compression ratio using a relatively lightweight architecture, against a CPU and a GPU implementation of JPEG, measuring compression ratios and timings. With these results, we run a simulation of a tele-immersion pipeline for various networking conditions and examine the performance of the compared codecs, calculating framerates and latencies for different codec/network combinations