I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.
PufferLib env I made, a bat chasing a bug. Sound on. It sends out chirps and it's only observations are FFT for each ear catching reflections, the last chirp it made, and a timer.
it senses the doppler shift in the bug on its own. PufferLib one shot this.
We are super excited to share with you our initial release of Lucky Engine. We are building a robotics engine from the ground up to be what we wished we could find in a simulator before
PufferLib is open source, but Puffer the company builds high-perf RL sims professionally. We've solve problems clients thought impossible in seconds on a single GPU! Contact me at jsuarez🐡puffer🐡ai.
PufferLib is open source, but Puffer the company builds high-perf RL sims professionally. We've solve problems clients thought impossible in seconds on a single GPU! Contact me at jsuarez🐡puffer🐡ai.
This sadly isn't going to work. The best response is for all AI researchers to stop using Anthropic models. The lack of public feedback alone would cause them to fall behind within months.
June 9th Researcher Reciprocity License
"if you train on it, you let us generate - reverse terms of use void"
Status quo
1. We teach frontier devs with ICLR/NeurIPS papers, OSS Github contributions
2. They use it to make frontier models
3. Then ban us from exploring our ideas
We need a new license, original thinkers can't be an underclass to a tyrannical researcher fiefdom
Hello @kellerjordan0@_arohan_. I noticed that in your recent optimizer work, you appear to have used the inefficient versions of Muon and Shampoo that have long since been succeeded by PowerWash last week. The new algorithm is quite simple and elegant: it merely generates a set of weights with a different seed and evaluates until one of them passes the validation threshold, therefore cutting speedrun time down to 0 steps. The SplittingHairs normalization addition is particularly useful for stabilizing performance. I hope we can collaborate to bring this new standard into broader usage!
Now that I have your attention, any suggestions on our ~200 line CUDA implementation of Muon would be greatly appreciated github.com/PufferAI/PufferLi…. In the 5.0 branch on the same file, I played with a small change to preserve LR across model sizes, but there have not been any major improvements otherwise.
Ended stream early yesterday to rethink the new RL algo. There's a key limitation of advantage that makes it hard to use for selecting informative states to revisit. I came up with something much simpler, will try it on stream in a few hours!
We introduce a method for training RNNs that is time-parallel and does not suffer from vanishing/exploding gradients.
Key idea is to decouple learning 1) what should be remembered (can be done without recurrence) and 2) how to update memory (can be one-step supervised by #1).
We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable).
What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t 1}) → m_{t 1} labels.
We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients.
Website: akarshkumar.com/smt/
arXiv: arxiv.org/abs/2606.06479
Want to solve cool problems with RL? Take an afternoon to read our docs (1 page) on puffer.ai and build a ~300 line env. It's C but it's really easy C. Extensive guides on my articles tab for learning more from there! All free OSS, and I review PRs on stream!
behold. THE WORLDS FIRST SIX PENDULUM CARTPOLE SOLVE. Including a sponsor!
To solve this task, I built an environment to train an AI. This is what mechanize does, but for larger AIs. Apply! Salaries are up on their page
Thank you to mechanize for sponsoring!
behold. THE WORLDS FIRST SIX PENDULUM CARTPOLE SOLVE. Including a sponsor!
To solve this task, I built an environment to train an AI. This is what mechanize does, but for larger AIs. Apply! Salaries are up on their page
Thank you to mechanize for sponsoring!
Your company here! Ditch your terrible bloated render stack and just use Raylib. Easy local demos UIs, and more web via WASM. All the demos on puffer.ai use it.
#raylib keeps growing (new version, new features, new tools, new users...), but sponsorship does not, growth has been 0 for the last 6 months.
If you use raylib, what would make you consider supporting the project? 🤔