It's About Time: Analog Clock Reading in the Wild
Charig Yang
Weidi Xie
Andrew Zisserman
Visual Geometry Group, University of Oxford
CVPR 2022

ArXiv | Code+Data | Bibtex



Abstract

In this paper, we present a framework for reading analog clocks in natural images or videos. Specifically, we make the following contributions: First, we create a scalable pipeline for generating synthetic clocks, significantly reducing the requirements for the labour-intensive annotations; Second, we introduce a clock recognition architecture based on spatial transformer networks (STN), which is trained end-to-end for clock alignment and recognition. We show that the model trained on the proposed synthetic dataset generalises towards real clocks with good accuracy, advocating a Sim2Real training regime; Third, to further reduce the gap between simulation and real data, we leverage the special property of time, i.e. uniformity, to generate reliable pseudo-labels on real unlabelled clock videos, and show that training on these videos offers further improvements while still requiring zero manual annotations. Lastly, we introduce three benchmark datasets based on COCO, Open Images, and The Clock movie, totalling 4,472 images with clocks, with full annotations for time, accurate to the minute.


Teaser Video



Results



Publication

C. Yang, W. Xie, A. Zisserman
It's About Time: Analog Clock Reading in the Wild
CVPR 2022
ArXiv | Code | Bibtex





Acknowledgements

We thank Yimeng Long for assistance on data annotation, Joao Carreira for an interesting discussion, Guanqi Zhan, Ragav Sachdeva, K R Prajwal, and Aleksandar Shtedritski for proofreading. This research is supported by the UK EPSRC CDT in AIMS (EP/S024050/1), a Royal Society Research Professorship, and the UK EPSRC Programme Grant Visual AI (EP/T028572/1).

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.