Julia Kim

Julia Kim

7 Photos and videos

Tweets

Pinned Tweet

Julia Kim

@_juliakeem

Mar 11

Data can’t just be outsourced🤯 To iterate fast, robotics teams must own their data infrastructure Introducing SyncField: turnkey data infrastructure for in-the-wild data collection (Best for UMI-style & Embodied human) #Robotics #UMI #DataCollection

1:40

134

14,589

Youngsun Wi

Julia Kim retweeted

Youngsun Wi @WiYoungsun

May 24

TactAlign was accepted to RSS 2026! Huge thanks to the reviewers for their thoughtful feedback. See you on the other side of the world 🤓🙌

Youngsun Wi @WiYoungsun

Feb 17

Dexterous hands vary widely—so do tactile modalities. 🖐️🌈 Our vision on tactile human-to-robot transfer: 🔓 Not tied to specific hardware ♻️ Reuse human tactile demos across embodiments Presenting TactAlign, a cross-sensor tactile alignment for cross-embodiment policy transfer.

0:11

12,762

Jerry Han

Julia Kim retweeted

Jerry Han

@JerryHan_og

May 6

Physical AI needs human data, but human data capture is still way too hard. Not because pressing record is hard. Because the moment you add cameras sensors, everything gets messy: Every device has its own clock. Streams can silently fail. Recording health has to be checked. Start / stop has to line up. Synchronization has to be solved after. SyncField Desktop turns it into one workflow. Auto-discover cameras sensors. Connect streams with aliases. Drag, arrange, and monitor panels. Record everything in one click. Review synchronized playback. Get frame-aligned data on disk. No handclaps. No LED flashes. No sync scripts. No file wrangling. Just humans doing real tasks, captured cleanly. If you're working on human data for Physical AI, reach out: opengraphlabs.com/#contact

1:06

520

KEN

Julia Kim retweeted

KEN

@Kensuke_ee_JP

May 5

Small Shenzhen meetup hosted with @rayanboukhanifi is done glad that some people showed up even though i know very little people in Shenzhen!

KEN

@Kensuke_ee_JP

May 4

Hello Shenzhen! Coming from Tokyo and will throw small meetup with Rayan (@rayanboukhanifi ) tomorrow at this place If you are building something new and cracked builders, come join us! Link: luma.com/7d5z1829

4,865

Julia Kim

Julia Kim

@_juliakeem

Apr 17

Contact-rich manipulation depends on contact information (tactile sensing and force magnitude) and its importance grows with dexterity

Chris Paxton

@chris_j_paxton

Apr 17

When it comes to manipulation tasks, tactile and force data are really important.

1,230

Jerry Han

Julia Kim retweeted

Jerry Han

@JerryHan_og

Apr 12

x.com/i/article/204314907383…

6,823

Julia Kim

Julia Kim

@_juliakeem

Apr 5

world models aren't just bigger video models What we truly need: (1) multimodal environments (2) structure-based reasoning (geometry, physics, affordances, spatial & symbolic reasoning) (3) Physics-aware interactions (4) Continuous real-world data loops

Fan-Yun Sun

@sunfanyun

Apr 3

Replying to @chrmanning

@chrmanning and I went on @latentspacepod to talk about world models. youtu.be/oBWRHnggscM?si=NndE…

10,943

Junfan Zhu 朱俊帆 ✈️ CVPR

Julia Kim retweeted

Junfan Zhu 朱俊帆 ✈️ CVPR

@junfanzhu98

Mar 29

📖Robotics World Model Reading Club #01 Summary @BostonDynamics, @Stanford, @AGIBOTofficial, @intbotai, @BytedanceTalk, @Google, @moonlake, @Rivian, @Meta, @Samsung, @UCBerkeley, @Cruise, @encord_team, @ManycoreTech, @OpenGraph_Labs, @neuralmotion, @AMD, @nvidia, @oysterecosystem, @Zoom, @FusionFundVC, @BoostVC, @yzilabs... policy learning→WM VLA: observation→action WAM: latent world→future trajectory→controllable action →Shift=reactive mapping→controllable simulation @nvidia Gr00t (7B, high mem efficiency on Thor)≈DreamDojo-style WAM. Bottleneck is NOT scale, but missing unified interface across perception–geometry–physics–action. 🧠 Representation Pixel space is redundant & non-geometric. Trend→Explicit 3D backbone: point cloud/mesh object sub-object representations geometry-aware tracking (contact, affordance) Point-flow pipeline: detect→sample keypoint→track→dynamic graph Core tradeoff=which points&density (motion saliency/affordance attn) 🌍 4D Reconstructi→Unified Latent @GoogleDeepMind D4RT encodes video→temporally consistent latent field: geometry motion visibility unified Outputs: point clouds, 3D tracks, full reconstruct (300× faster) ❗Gap: no shared latent across: vision/geometry/semantics/action/physics ⚙️ Physics Gap Sim2Real Gap=physics, not vision: discontinuous contact deformable objects (∞ DoF) non-differentiable friction Engineering fails: brittle collision meshes, unstable contact Solutions: learned physics proxy hybrid pipeline convex decomposition (geometry → collision proxy, ~5× speedup) 🎥 Video Pretrain≠Interaction Video=strong prior but no counterfactuals Missing: force, depth, tactile, proprioception →can't answer: what if act differently ⏱️ Control≠Inference Real world=high-freq loop action chunking latent action FastWAM (train with rollout, infer without) KV-cache (AutoGaze) 👉control selects feasible trajectory, not full future modeling Thor is good, but LLM scaling≠robotics scaling 📉 Data No “robotics internet”: sim/video/teleop/factory logs fragmented no unified labeling or metrics Reality: factories use fixed primitives generalization often unnecessary Bitter lesson: data flywheel>pipelines (but robotics lacks one) 🦾 Embodiment Gap manipulation→full-body intelligence loco-manipulation gaze coordination Need cross-embodiment align (space, action, kinematics) 🔁 Sim2Real Pipeline human data→semantics→geometry→collision proxy→sim→fine-tuning Unsolved: deformables, contact stability, long horizon 🧩 Paper VQVAE (discrete latent) VL-JEPA (predictive align) token pruning (efficiency) recursive models (depth reuse) multi-path exploration (GRPO) ⚡ Infra→SLM Real-time stack (LLM infra too slow) →WM must compress into SLMs Future=small, domain-specialized, grounded models 🧪Bottlenecks no unified representation no data flywheel inference–control mismatch physics fragmented embodiment Reality can't be scraped like internet. It must be sensed, interacted, simulated. 👉 Goal: jointly optimize representation simulation action under physics constraints 💡minimal sufficient representation? can video DiT become WAM? vertical SLM inevitable? robotics ImageNet moment?

Junfan Zhu 朱俊帆 ✈️ CVPR

@junfanzhu98

Mar 29

x.com/i/article/203815272776…

321

61,464

Julia Kim

Julia Kim

@_juliakeem

Mar 25

data is being collected in regions where robots won’t be deployed anytime soon due to low labor costs, while the environments where deployment is actually viable remain largely inaccessible and require smarter, more strategic approaches to unlock

Jacob Zietek

@JacobZietek

Mar 25

Robotics has spent decades optimizing for research. Deployment requires a completely different kind of person: operators, industrialists, and outsiders the field typically ignores. There's a wave of people who want to build in robotics. The field doesn't know what to do with them. New essay, Robotics Needs Fewer Roboticists* below 👇

542

OpenGraph Labs 🧤

Julia Kim retweeted

OpenGraph Labs 🧤@OpenGraph_Labs

Mar 19

Excited to share that @OpenGraph_Labs has been accepted into @NVIDIA’s Inception Program 🚀 Our mission is to build reliable infrastructure for multimodal data capture, powering the next generation of robotics & world models 🌎

1,640

Jerry Han

Julia Kim retweeted

Jerry Han

@JerryHan_og

Mar 17

World models can predict the next frame. They can't predict the next touch. That's the gap visuo-tactile world models will close. Is the robot gripping hard enough? Is the surface rigid or soft? When exactly does contact begin and end? Vision doesn't know. Tactile does. We built @OpenGraph_Labs to capture what cameras miss. Egocentric RGB × 5-finger multi-taxel tactile gloves. Frame-synced. Calibrated. In-the-wild. No lab setups. No scripted pick-and-place. Just humans doing real tasks in real stores. Watch the exact moment contact happens. The pressure map lights up in sync. Every touch. Every frame. 👇

1:05

117

12,763

Julia Kim

Julia Kim

@_juliakeem

Mar 15

Robotics & world models require real-world multi-sensory data at scale. But collecting vision, tactile, and IMU data simultaneously is much harder than it sounds. Each sensor runs at different frequencies, latencies, and clock domains. Integrating them means dealing with hardware quirks, driver inconsistencies, and constant timestamp drift. This is fundamentally a synchronization problem. And it gets harder as more modalities are added and tasks become longer-horizon, because temporal misalignment compounds: the model loses the causal structure of what happened and when. We learned this the hard way building our own pipelines. That experience led us to build a unified platform for multimodal capture, one that handles time alignment, hardware abstraction, and data integrity from day one. @OpenGraph_Labs built 'SyncField - Multimodal Data Capture System " which: ▪️ Supports any hardware configuration (multiple cameras tactile IMU) ▪️ Automatic synchronization across all modalities ▪️ Output is fully time-aligned and ready to train on It already powers humanoid robotics teams, data collection companies, and university research labs. If your team is collecting multimodal robotics data, we'd love to talk. (now onboarding teams one by one)

293

17,406

Bercan

Julia Kim retweeted

Bercan

@bercankilic

Mar 15

x.com/i/article/203308043482…

334

92,750

Julia Kim

Julia Kim

@_juliakeem

Mar 11

visuo-tactile world model tactile sensing is critical for contact state and contact interaction dynamic

Bessemer

@BessemerVP

Mar 11

Robotics today looks a lot like NLP in 2005. We hand-code physics simulations the same way linguists hand-coded grammar rules. And it doesn't scale. A new class of models — world models — learns physics from video instead. The early results are striking. The gaps are real. Here's what you need to know. → bvp.com/atlas/can-world-mode… cc: @TaliaGold, @bhavikvnagda, @gracejhma

125

24,434

Julia Kim

Julia Kim

@_juliakeem

Mar 11

1:40

134

14,589

Julia Kim

Julia Kim

@_juliakeem

Mar 11

With SyncField, you control your data collection from the ground up. Own the infrastructure. Track everything from day one. Scale data quantity on top of a unified infrastructure.

973

Julia Kim

Julia Kim

@_juliakeem

Mar 11

Inquiry: opengraphlabs.com/

OpenGraph Labs - Scaling Data for Physical AI

The data engine for Physical AI. Building large-scale, high-fidelity datasets that accelerate robot learning.

opengraphlabs.com

809

Jerry Han

Julia Kim retweeted

Jerry Han

@JerryHan_og

Feb 27

VLMs see everything. Feel nothing. VLMs annotate what looks like contact. Tactile sensors verify what actually is contact. We ran VLM annotation on real manipulation demos. It labeled a grasp as "approach." Skipped release phases entirely. Hallucinated state transitions that never occurred. 6 out of 36 action phases wrong. 17% of your training data, corrupted. Why? Pixels don't know when a fingertip touches a surface. Pixels don't know when grip pressure hits zero. Tactile sensors do. So we built a pipeline that catches every error automatically. Tactile evidence validates every contact transition. Wrong labels corrected, missing phases inserted. Not faster labeling. Truthful labeling. This is one of the core problems we're solving at @OpenGraph_Labs

0:44

4,453

Danfei Xu

Julia Kim retweeted

Danfei Xu

@danfei_xu

Feb 26

x.com/i/article/202178505868…

137

39,119

Jerry Han

Julia Kim retweeted

Jerry Han

@JerryHan_og

Feb 21

Robots can't learn if their eyes and hands are out of sync. A 30fps camera and a 1kHz tactile sensor don't speak the same language. Multiple cameras, multiple sensors, all on different clocks at different rates. Jitter from USB and OS scheduling. Drift that compounds every second of recording. We built a multi-modal sync pipeline that aligns all of it to ±2.5ms. Automatically. Every frame matched. Zero sensor samples lost. No hardware triggers needed. Sensor-agnostic. Hardware-agnostic. Just plug in and record. Physical AI needs real hand-eye coordination. Not approximate, precise. We're building this at OpenGraph. opengraphlabs.com/

0:25

289

20,715

OpenGraph Labs 🧤

Julia Kim retweeted

OpenGraph Labs 🧤@OpenGraph_Labs

Feb 18

Tactile feedback is critical for safe and reliable real-world robot deployment 🤚🧤🤖 Really impressive work from @WiYoungsun demonstrating how a shared latent space can bridge tactile signals from wearable gloves to robot embodiments

Youngsun Wi @WiYoungsun

Feb 17

0:11

1,062