There are indeed a lot of cool self-supervised tasks that one can devise when one deals with images, such as jigsaw puzzles [6], image colorization, image inpainting, or even unsupervised image synthesis. 1
But what happens when the time dimension comes into play? 1How can you approach the video-based tasks that you would like to solve? 1