Identifying the true structure of the latent data generating process (causal world model) by learning from passive observation alone is impossible without reasonable priors, which transformers don’t have. This is why LLMs are brittle and can’t reliably do counter factual reasoning .
New paper: How can you tell if a transformer has the right world model?
We trained a transformer to predict directions for NYC taxi rides. The model was good. It could find shortest paths between new points
But had it built a map of NYC? We reconstructed its map and found this: