Excited to share DriftQL☄️, a new paradigm for offline RL.
Instead of fitting a behavior prior, DriftQL learns a one-step Q-guided actor whose samples are corrected by a drift field.
Simple. SOTA on OGBench/D4RL. No denoising. No solvers. No auxiliary actor. No distillation.
With my co-authors
@mo_danesh, Amin Abyaneh, Scott Fujimoto, Hsiu-Chin Lin, David Meger
🌐
driftql.github.io
🧵