"no strict notion of words and their order exists"
- words are directly represented with ordered subword tokens and word bound tokens
- order is directly represented with positional embedding
both notions are explicitly intrinsic in the system the weights exist in
no magic
The fact that frontier LLMs like Claude or Gemini can take a text of thousands of lines and output (most of the time) the same input text verbatim without even a minor change is mind-blowing.
The text inside the LLM is transformed into an internal representation where no strict notion of words and their order exists, and then, after this transformation, by simply sampling tokens, this text can be perfectly reconstructed is an unbelievable feat.