Charig Yang

About Me

I am a Research Scientist at Isomorphic Labs, an Alphabet spinoff from Google DeepMind focusing on AI for drug discovery. My research involves understanding systems that move and change. More broadly, I enjoy doing exploratory research in machine learning and computer vision, with a focus on multimodal and self-supervised learning.

I completed my PhD at the Visual Geometry Group (VGG), University of Oxford with Andrew Zisserman and Weidi Xie. My thesis, "Learning from Time", explores how machines can understand the ever-changing visual world, and was examined by Christian Rupprecht (Oxford) and William T. Freeman (MIT). I also interned at Meta Reality Labs on multimodal AI for smart glasses.

Previously, I did my undergraduate in Engineering Science, also at Oxford. During which, I spent lovely summers at Japan Railways, Metaswitch (acquired by Microsoft), Oxford-Man Institute, True, and CP Group.

Prior to this, I was born and raised in the suburbs of Bangkok, Thailand.

Experience

🧬 Isomorphic Labs

Research Scientist | Jun 2025 - Present | London, UK

Machine learning research for drug discovery.

🕶️ Reality Labs, Meta

Research Scientist Intern | Jun 2024 - Feb 2025 | Seattle, WA

Multimodal (RGB, eye gaze, inertial sensors) AI for reading detection in smart glasses.

👀 VGG, University of Oxford

PhD Student, Computer Vision | Oct 2020 - Apr 2025 | Oxford, UK

Multimodal visual understanding by leveraging time.

Thesis: Learning from Time

Supervisors: Andrew Zisserman, Weidi Xie

Examiners: Christian Rupprecht (Oxford), William T. Freeman (MIT)

Topics: video understanding, self-supervised learning, multimodal, motion segmentation, applications

With the rise of deep learning, computer vision systems have been highly successful at understanding images. However, understanding the dynamic visual world we live in requires both understanding the appearances of individual image frames, and the temporal relationships between them. This thesis aims to understand videos through the lens of time, by learning from the temporal relationships within image sequences, both instantaneously and over a period of time.

In the first half, we focus on using instantaneous motion – temporal changes between neighbouring video frames – to discover moving objects, based on the intuition that the subject in the video usually moves independently from the background. We propose two methods that can solve this task: first, for a single object in a self-supervised manner by grouping motion into layers, and second, for multiple objects over time in a supervised manner using a vision foundation model. We show applications towards general videos, as well as discovering objects with minimal visibility such as camouflages, where we also present the largest video camouflage dataset to date.

In the second half, we go beyond instantaneous changes and learn from patterns of changes over time, from seconds (natural videos) to days (time-lapse videos) to years (longitudinal images). We leverage the properties of time as a direct supervisory signal, and introduce applications that were previously unachievable in computer vision. We first exploit "uniformity" – that time flows at a constant rate, to read analog clocks in unconstrained scenes. We then relax this constraint to "monotonicity" – that certain changes are consistently unidirectional over a period of time, to discover monotonic changes in a sequence of images. For both cases, we also contribute datasets to foster further research.

Photo: AZ, me, Bill Freeman (who flew over from MIT!), Christian Rupprecht.

Research

Reading Recognition in the Wild

Charig Yang, Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, Lambert Mathias, Kiran Somasundaram, Luis Pesqueira, James Fort, Sheroze Sheriffdeen, Omkar Parkhi, Carl Ren, Mi Zhang, Yuning Chai, Richard Newcombe, Hyo Jin Kim

In submission, 2025 (Internship paper at Meta)

An investigation into what it means to read from different modalities (eye movement, camera, intertial sensors).

Discovering Monotonic Temporal Changes via Self-supervised Video Ordering

Charig Yang, Weidi Xie, Andrew Zisserman

ECCV, 2024 (Oral Presentation)

Changes happen all the time, but only some are consistent over time. We present the first solution to this new task by ordering shuffled sequences.

Moving Object Segmentation: All You Need Is SAM (and Flow)

Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman

ACCV, 2024 (Oral Presentation)

SAM + Optical Flow = FlowSAM.

It's About Time: Analog Clock Reading in the Wild

Charig Yang, Weidi Xie, Andrew Zisserman

CVPR, 2022

We present the first working solution to a niche but fun problem of reading clocks (that 2025's VLMs still fails!). We circumvent manual supervision by exploiting the fact that time flows at a constant rate.

Self-supervised Video Object Segmentation by Motion Grouping

Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie

ICCV, 2021 (Short: CVPR Workshop, 2021, Best Paper Award)

Motion can be used to discover moving objects in general. We introduce a self-supervised segmentation method by grouping motion into layers using a transformer.

Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation

Hala Lamdouar, Charig Yang, Weidi Xie, Andrew Zisserman

ACCV, 2020

Camouflaged animals are hard to see, only until they move. We present a method of discovering camouflages using motion, and a large-scale video camouflage dataset.

Misc

📚 Teaching

C18 Computer Vision and Robotics, C19 Machine Learning, B14 Information Engineering Systems (incl. practicals), B1 Engineering Computation, A1 Mathematics, A2 Control Theory, P2 Electronics

🏆 Awards

• Best Presentation Award, UK Robotics CDT Conference

• Best Paper Award, CVPR Workshop on Robust Video Scene Understanding

• Best Poster Award, Information Engineering undergraduate thesis

• Edgell Sheppee Prize for second-best overall academic performance in Engineering

• National Olympiads: 4x silver, 1x bronze (maths), 1x bronze (chemistry)

🎲 Random

• I was part of the Guinness world record on most people dipping bread in egg simultaneously

• I was part of the Oxford Poker team, where we defeated Cambridge in the Varsity match

• This entire website was vibe coded using Cursor