Home   Projects   CV   Misc    

TAPNext: Tracking Any Point (TAP) as Next Token Prediction

Artem Zholus, Carl Doersch, Yi Yang, Skanda Koppula, Viorica Patraucean, Xu He, Ignacio Rocco, Mehdi S. M. Sajjadi, Sarath Chandar, Ross Goroshin in ICCV 2025

Direct Motion Models for Assessing Generated Videos

Kelsey Allen, Carl Doersch, Guangyao Zhou, Mohammed Suhail, Danny Driess, Ignacio Rocco, Yulia Rubanova, Thomas Kipf, Mehdi S. M. Sajjadi, Kevin Murphy, Joao Carreira, Sjoerd van Steenkiste in ICML 2025

Motion Prompting: Controlling Video Generation with Motion Trajectories

Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

Homanga Ballav, Suneel Belkhale, Philipp Kühenbühl, Kanika Madan, Carl Doersch, Igor Mordatch, Deepak Pathak in CoRL 2025

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, Joao Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch in NeurIPS 2024

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, João Carreira, Andrew Zisserman in ACCV 2024

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz in ICRA 2024

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Ankush Gupta, Yusuf Aytar, João Carreira, Andrew Zisserman in ICCV 2023

The Perception Test

Viorica Patraucean, Lucas Smaira, Ankush Gupta, Adria Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, Joao Carreira ECCV/ICCV Workshop Series

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang in NeurIPS Datasets and Benchmarks 2022

Input-level Inductive Biases for 3D Reconstruction

Wang Yifan, Carl Doersch, Relja Arandjelovic, Joao Carreira, Andrew Zisserman in CVPR 2022

Kubric: A Scalable Dataset Generator

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi in CVPR 2022

Perceiver IO: A General Architecture for Structured Inputs & Outputs

Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Henaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joao Carreira in ICLR 2022

Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs

Dan Rosenbaum, Marta Garnelo, Michal Zielinski, Charlie Beattie, Ellen Clancy, Andrea Huber, Pushmeet Kohli, Andrew W. Senior, John Jumper, Carl Doersch, S. M. Ali Eslami, Olaf Ronneberger, Jonas Adler in NeurIPS 2021 workshop on Machine Learning in Structural Biology

CrossTransformers: spatially-aware few-shot transfer

Carl Doersch, Ankush Gupta, Andrew Zisserman in NeurIPS 2020

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko in NeurIPS 2020 (Oral)

Data-Efficient Image Recognition with Contrastive Predictive Coding

Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord in ICML 2020

Sim2real transfer learning for 3D human pose estimation: motion to the rescue

Carl Doersch, Andrew Zisserman in NeurIPS 2019

Exploiting temporal context for 3D human pose estimation in the wild

Anurag Arnab, Carl Doersch, Andrew Zisserman in CVPR 2019

Video Action Transformer Network

Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman in CVPR 2019

A Better Baseline for AVA

Rohit Girdhar, João Carreira, Carl Doersch, Andrew Zisserman in CVPR 2018 ActivityNet Workshop

Kickstarting Deep Reinforcement Learning

Simon Schmitt, Jony Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech Czarnecki, Joel Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, Ali Eslami in NIPS 2018 Reinforcement Learning Workshop

Learning Visual Question Answering by Bootstrapping Hard Attention

Mateusz Malinowski, Carl Doersch, Adam Santoro, Peter Battaglia in ECCV 2018

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Mateusz Malinowski, Carl Doersch in ECCV 2018 Workshop on Shortcomings in Vision and Language

Multi-task Self-Supervised Visual Learning

Carl Doersch and Andrew Zisserman in ICCV 2017

Supervision Beyond Manual Annotations for Learning Visual Representations

Carl Doersch.
Carnegie Mellon Thesis Dissertation

Tutorial on Variational Autoencoders

Carl Doersch.
Arxiv Tech Report, June 2016

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

Jacob Walker, Carl Doersch, Abhinav Gupta, and Martial Hebert.
in ECCV 2016

Data-dependent Initializations of Convolutional Neural Networks

Philipp Krähenbühl, Carl Doersch, Jeff Donahue, and Trevor Darrell.
ICLR, 2016

Unsupervised Visual Representation Learning by Context Prediction

Carl Doersch, Abhinav Gupta, and Alexei A. Efros.
in ICCV 2015 (oral)

Context as Supervisory Signal: Discovering Objects with Predictable Context

Carl Doersch, Abhinav Gupta, and Alexei A. Efros.
In ECCV 2014

Mid-Level Visual Element Discovery as Discriminative Mode Seeking

Carl Doersch, Abhinav Gupta, and Alexei A. Efros.
In NIPS 2013

What Makes Paris Look like Paris?

Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros.
In SIGGRAPH 2012 (oral)
Republished on the cover of the CACM magazine Dec. 2015

Bounding the Probability of Error for High Precision Optical Character Recognition

Gary B. Huang, Andrew Kae, Carl Doersch, and Erik Learned-Miller.
In JMLR 2012

Improving state-of-the-art OCR through high-precision document-specific modeling.

Andrew Kae, Gary B. Huang, Carl Doersch, and Erik Learned-Miller.
In CVPR 2010