AlphaDrive: Trains a reasoning VLM to output multiple discrete action plans (accelerate, turn left) for autonomous driving.
Much better than zero-shot or SFT on MetaAD, a new dataset of 110K 3s clips. In ablations, SFT < RL < SFT RL. Looks like the days of pure SFT are over!