session went well, was fun to whiteboard with people again after a while. thanks
@itsarnavb and
@_lagrangepoint for hosting us, and everyone who joined. you’re og
quick summary of what we talked about:
started by sketching a crude version of how we’d integrate force as a modality ourselves. image tokenised through ViT, proprio, and text encoded into a pretrained VLM, latent flowing into an action head in the same policy. also talked about action chunking on the decoder helping with inference latency and temporal coherence within a chunk, though long chunks risk drifting from the world
then started picking apart the paper. the action head is actually a separate altered transformer that ingests numerical force as input and outputs (delta_p, f), directly fed into the robot’s impedance controller which converts to motor torque and makes the arm respond to feedback. different from the usual IK-based position controller logic
discussed that the improvement over pi0 might have come from the model being steered to produce a force vector as output, not the MoE in the action head. paper doesn’t show with/without MoE ablations so can’t say
also briefly touched on interrupts for robots. training time we feed deliberately bad demos and recover. inference time we use DAgger and add a human intervention label and retrain. model has no internal knowledge of whether data came from human intervention or general teleop
some questions that branched off from our discussion:
- apart from force, what other modalities of input and feedback would make action policies more aware and accurate?
- slow and fast moving policies and how they get baked together at high frequency of operation?
- flow inpainting as an inference time optimisation for cloud based vlas? (under explored)
planning to host a few people for a paper reading session in Indiranagar on 30th April. the discussion will be anchored around contact rich manipulation. we’ll mostly discuss the recent ForceVLA2 paper, but I’ll curate a few more reading resources around it.
keeping the first one small. looking for 4-5 engineers/researchers comfortable with transformer internals, flow matching/diffusion math, and recent VLA architectures(pi 0.5,0.7). would be awesome if we can get anyone who also has experience with contact dynamics
lunch is on me :P