Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

Weidi Xie

Visual Geometry Group, University of Oxford

ArXiv 2024

ArXiv | Code+Data | Bibtex

Abstract

Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a flexible transformer-based model for general-purpose ordering of image sequences of arbitrary length with built-in attribution maps. After training, the model successfully discovers and localizes monotonic changes while ignoring cyclic and stochastic ones. We demonstrate applications of the model in multiple video settings covering different scene and object types, discovering both object-level and environmental changes in unseen sequences. We also demonstrate that the attention-based attribution maps function as effective prompts for segmenting the changing regions, and that the learned representations can be used for downstream applications. Finally, we show that the model achieves the state of the art on standard benchmarks for ordering a set of images.

Results

Publication

C. Yang, W. Xie, A. Zisserman
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
ArXiv 2024
ArXiv | Code | Bibtex

Acknowledgements

We thank Tengda Han, Ragav Sachdeva, and Aleksandar Shtedritski for suggestions and proofreading. This research is supported by the UK EPSRC CDT in AIMS (EP/S024050/1), and the UK EPSRC Programme Grant Visual AI (EP/T028572/1).
This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.