Frontier LLMs train on observational data — text and behaviour recorded while the subject knew they might be watched. Nine decades of behavioural science say that's a signal-distorting condition.
So I wrote a paper about what a different kind of training data would look like.
The proposal: an intent pair — declared intent paired with verified outcome, gathered where the declarant is architecturally unidentifiable. (Iₐ, C, M, O) — declaration, classification, match, lived outcome. The label isn't assigned by an annotator; it's supplied by the user's own subsequent action in the world.
Honesty: this corpus does not exist. Can't be scraped (the web is observational). Can't be simulated (the unobserved declaration IS the signal). Can't be extracted from existing systems. The repo is the schema and the seed. The network that produces it is the work.
Background: this is the third paper in a research arc I've directed at
@ProfilaPrivacy over four years. The two earlier works are peer-reviewed:
— "Zero Knowledge Advertising" (UC3M Profila, 2022) — established declared-vs-inferred and the zero-knowledge protocol.
— "A Question Answering Tool for Website Privacy Policy Comprehension" (HSLU Profila, HCII 2023) — established domain-specific retrieval and grounding against hallucination.
Both findings are load-bearing in the LMM design.
On state of implementation: the declaration, classification, and matching layers exist in production at
@ProfilaPrivacy . The architectural unobservability piece — making the declarant unidentifiable by construction, not by promise — is still in research, not yet implemented.
That's the work ahead.
Open and looking for hard critique:
— Where does the observational-ceiling argument break? — Failure modes of the intent-pair primitive? Gaming, Goodhart, declared-vs-real intent drift? — What's the state of the art on architectural unobservability that I should be reading?
Paper:
doi.org/10.5281/zenodo.20616… Repo:
github.com/Profila/42-True