1. private project (but very OOD)
2.
x.com/_ueaj/status/204309998β¦ (though this was sonnet iirc, but it was the only model in it's class that could even half do the task at all, didn't try haiku)
3. some synth data work I had, though it was probably just down to the fact it's the only model with non dogshit multiturn
new blog!? This one is pretty cool and I'd recommend reading it if you have any interest in mechinterp, architecture search, RL, or hallucination reduction.
This project is a purely symbolic representation of how I think transformers think under the hood. The model is essentially a program format that is capable of representing all the ways a transformer can think and not much more than that.
It allows you as a human to get a visceral experience as to what it's like to be a transformer. It allows you to *experience* the gaps the transformer architecture has in terms of representation power. I find that similar thought experiments are very helpful for architecture research.
There are other possible uses for this project though, like as a object or target representation for mechinterp research. IMO the current state of mechinterp is only superficially interpretable, as it presents everything to the user as a flattened list. If we want to scale up to the point where a person can be expected to develop a deep model of how an frontier LLM works on the inside, better target representations are needed.
The other very important application is teaching models how to think for themselves. After all, if it can teach a human how a transformer thinks, then it could probably teach a transformer how to think for itself. I think giving a model a visceral experience on how it's own inner workings can be constructed might improve downstream performance on hallucinations among other things.
I don't think this single project is an end-all-be-all, but we should be exploring the degree to which self awareness can be used to reduce hallucination rates. Afterall, a transformer and human work very differently and thus the generalizations we will make will also be very different. For an LLM to know what kind of errors a transformer makes, it must learn that for itself, since it's not in the training data.
blog and code in comments!