Robert McCarthy

Robert McCarthy

8 Photos and videos

Tweets

Pinned Tweet

Robert McCarthy @rmcc_11

10 Jun 2024

⚡️“Towards Generalist Robot Learning from Internet Video: A Survey” ⚡️ Collecting robot data is hard. In our new survey, we review methods that can leverage large-scale internet video data to help overcome the robot data bottleneck: arxiv.org/abs/2404.19664 See more below👇

28,610

Robert McCarthy

Robert McCarthy @rmcc_11

Mar 6

Reasoning models struggle to control their chains of thought... and that's good for AI safety!

Tomek Korbak

@tomekkorbak

Mar 5

We have a new eval to help keep chains of thought (CoT) monitorable: CoT Controllability. This tests whether LLMs can control their CoT, helping to evade CoT monitors. So far, the results leave us cautiously optimistic: today’s models struggle to obfuscate their reasoning in ways that undermine monitorability.

400

Robert McCarthy

Robert McCarthy @rmcc_11

24 Jul 2025

Happy to share that our survey paper has been accepted to JAIR! Data scarcity continues to be a *major* bottleneck for general-purpose robotics. Check out our updated survey to see how internet-scale video data can help! Original 🧵: x.com/rmcc_11/status/1800154…

J. AI Research-JAIR @JAIR_Editor

22 Jul 2025

New Article: "Towards Generalist Robot Learning from Internet Video: A Survey" by McCarthy, Tan, Schmidt, Acero, Herr, Du, Thuruthel, and Li jair.org/index.php/jair/arti…

150

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

We also review LfV datasets and benchmarks. We detail techniques for curating and annotating video datasets. We note a lack of LfV-specific benchmarks and provide recommendations for how future LfV benchmarks should be designed.

275

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

Amongst other key takeaways, we recommend: - Adopting scalable techniques that can leverage as much internet video data as possible. - Focusing on learning policies and dynamics models directly from video data, in order to best obtain the benefits of LfV.

257

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

Thanks to my excellent collaborators and supervisors! @DanielCHTan97, @schmidtdominik_, @fernandoacero_, Nathan Herr, @du_yilun, Thomas Thuruthel, Alex Li Paper: arxiv.org/abs/2404.19664 If you have any feedback, questions, etc., please do get in touch!

249

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

There are several promising directions for learning policies from video data. Here, it is not yet clear whether monolithic or compositional approaches will prevail.

231

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

28,610

more replies

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

⚡️ LfV-for-Robotics ⚡️ Video data can be used to learn an RL ‘Knowledge Modality’ (KM). Targeting each KM comes with its own pros and cons. We believe targeting the Policy and Dynamics Model to be most promising.

208

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

We identify and review several categories of action representations. Latent actions are promising but have yet to be scaled to large-scale, ‘in-the-wild’ internet video.

210

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

⚡️ Action Representations ⚡️ Problem: We wish to obtain an action-outputting robot policy. However, internet video does not come with explicit action labels. Solution: Use action representations to relabel video data with explicit action information.

220

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

⚡️ Video Foundation Models ⚡️ Scalable video foundation model techniques are promising for extracting knowledge from large-scale, heterogenous internet video datasets. We review these in detail in our survey.

256

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

⚡️ LfV Challenges ⚡️ However, LfV comes with several challenges, including: - Missing action labels and low-level information in video - Distribution-shifts between internet video and the robot domain - High-dimensionality, noise, and redundancy in video data

344

Robert McCarthy

Robert McCarthy @rmcc_11

10 Jun 2024

⚡️The Promise of LfV ⚡️ Internet video comes in massive quantities and contains information highly relevant to a generalist robot. As such, Learning from Videos (LfV) promises benefits such as improved generalization and data-efficiency w.r.t the available robot data.

434

Robert McCarthy

Robert McCarthy @rmcc_11

18 Jun 2023

Big thanks to @aegeanairlines for the generous scholarship covering my round-trip flights to @M2lSchool! Looking forward to attending the M2L summer school soon!

103