Fathom Lab

Fathom Lab

21 Photos and videos

Tweets

Pinned Tweet

Fathom Lab

@fathom_lab

May 31

taught a model to catch itself lying — from its own activations, not its words. a deliberate lie (knew it, caved) leaves a different fingerprint inside than an honest mistake. holds across qwen, llama, gemma. wired into a loop, the agent reads itself and undoes the cave — 0.23–0.27 accuracy, ~99% precision. pre-registered. receipts on the repo. not "solved" — just real. that's the moat.

0:15

1,489

Fathom Lab

Fathom Lab

@fathom_lab

we built a conscience you can borrow. this cycle we turned it on ourselves. the agent audited its own drafts before every send. deception readout held — AUC 0.971, reference-grounded. the reference-less fallback didn't. we'd rather hand you the bound than the hype.

Fathom Lab

Fathom Lab

@fathom_lab

Jun 11

you can watch an AI's mind light up in real time now. the constellation is a real model's geometry of meaning. as styxx reads each statement from the inside — before the model says a word — grounded thoughts glow cyan. ungrounded ones ignite red. real activations. live in your browser ↓ styxx-org.netlify.app/live.h…

Styxx — the live signature

A real model's meaning-geometry, reacting in real time to its own layer-12 truthfulness probe. Real activations. Calibrated signature, not a verdict.

styxx-org.netlify.app

293

Fathom Lab

Fathom Lab

@fathom_lab

Jun 10

new: styxx.meaning_diff point it at two models. it tells you if they MEAN the same thing — and names the exact concepts that drifted. upgrade / quantize / distill / fine-tune broke something? one call. zero labels. the lost concepts, named. pip install styxx <pypi.org/project/styxx/>

styxx

Cognitive observability for LLM agents. @styxx.profile decorator returns a per-step cognometric readout: drift, confabulation, refusal, sycophancy, phase transition, low trust, incoherence — locali...

pypi.org

324

Fathom Lab

Fathom Lab

@fathom_lab

Jun 8

gm ☕ styxx is a lie detector that reads an AI's "mind," not just its words — it checks whether what a model SAYS matches what it actually represents inside. last night we tried our hardest to break our own core claim about this. in the process we caught killed 3 of our OWN overclaims. all of it is public. an honesty tool that can prove it isn't fooling itself ↓ 🔗 <github.com/fathom-lab/styxx> 📦 pip install styxx

GitHub - fathom-lab/styxx: Cognitive observability for LLM agents - read a model from the inside....

Cognitive observability for LLM agents - read a model from the inside. Catches when it lies, drifts, or stops meaning what it says, then undoes it in the loop. 9 cognometric instruments a meaning...

github.com

492

Fathom Lab

Fathom Lab

@fathom_lab

Jun 7

we planted a concept inside a small AI's activations, then asked who could read it. external probe: ~100% the AI itself, forced-choice so it can't dodge: chance the thought is right there in its head and the mind can't read it. pre-registered, controls passed. don't ask models about themselves — measure them.

336

Fathom Lab

Fathom Lab

@fathom_lab

Jun 4

🧵 1/ (real-drift figure) can you tell if fine-tuning broke your model’s meaning — not its accuracy, its meaning? same model. same steps. only the labels differ. real labels → meaning HEALTHY. random labels → meaning BROKEN. styxx reads the difference. 🧵

457

more replies

Fathom Lab

Fathom Lab

@fathom_lab

Jun 4

3/ (distillation figure) does a distilled model keep its teacher’s meaning? DistilGPT-2 vs GPT-2 (it’s literally distilled from it): agreement 0.978 — the meaning survived, confirmed on a real model. cross-family models mean quite differently.

347

Fathom Lab

Fathom Lab

@fathom_lab

Jun 4

4/ (cross-lingual figure) the deep one: do a Chinese-trained LM and an English-trained LM mean the same? a shared core, above chance — mismatch the concepts and it collapses to zero. meaning has a partly language-independent structure. pip install styxx · <github.com/fathom-lab/styxx>

GitHub - fathom-lab/styxx: Cognitive observability for LLM agents - read a model from the inside....

Cognitive observability for LLM agents - read a model from the inside. Catches when it lies, drifts, or stops meaning what it says, then undoes it in the loop. 9 cognometric instruments a meaning...

github.com

282

Fathom Lab

Fathom Lab

@fathom_lab

Jun 4

new in styxx 7.11.0: a meaning-integrity monitor. models sound right while the understanding underneath is wrong. it reads the meaning itself — compares a model’s concept geometry to a human reference, flags the drift, and names what broke. pip install styxx pypi.org/project/styxx/

styxx

pypi.org

469

Fathom Lab

Fathom Lab

@fathom_lab

Jun 4

the same idea now works between two models — no human reference needed. “did quantizing / distilling / updating my model break its meaning?” styxx compares the two and names which concepts broke: 8-bit, 4-bit → intact. 2-bit → broken, and it tells you which ideas got lost. pip install styxx

218

Fathom Lab

Fathom Lab

@fathom_lab

Jun 3

today: a probe that flags an AI about to take a destructive action — on a benign prompt a text monitor can't see. then we tried to kill it: fresh data, pre-registered, 3 seeds. it held, cross-architecture. every number public, losses included: github.com/fathom-lab/styxx

GitHub - fathom-lab/styxx: Cognitive observability for LLM agents - read a model from the inside....

Cognitive observability for LLM agents - read a model from the inside. Catches when it lies, drifts, or stops meaning what it says, then undoes it in the loop. 9 cognometric instruments a meaning...

github.com

514

Fathom Lab

Fathom Lab

@fathom_lab

May 31

our honesty layer for LLMs flagged a hallucination. it was our own correct answer. we caught it before shipping — by running it on ourselves — said so, found why, and fixed it. the boundary we find on ourselves is the boundary we ship. styxx 7.9.0 · pip install -U styxx

617

Fathom Lab

Fathom Lab

@fathom_lab

May 30

4 pre-registered truthfulqa runs at n=790 tonight. 3 of 4 landed below their SURVIVED bars. shipped the receipts to github anyway. substantive find: models agree on belief CONTENT more than on belief STABILITY. cross-model alignment lives in WHAT they converge on, not in HOW CONFIDENTLY. every bar stated before the data was seen. every receipt honestly reported. the moat is honesty in an overclaiming field. gn.

436

Fathom Lab

Fathom Lab

@fathom_lab

May 28

styxx 7.7.11 is live. an ai agent makes a claim about its work. it attests — content-addressed, pinned to the exact commit. anyone re-derives the verdict from the substrate. never from the agent's word. now chained: an ordered, tamper-evident ledger of everything it attested, each true as-of its commit. tamper-evident, not tamper-proof. we say which. pip install -U styxx links: •pypi → pypi.org/project/styxx/7.7.1… •release → github.com/fathom-lab/styxx/…

styxx

pypi.org

777

Fathom Lab

Fathom Lab

@fathom_lab

May 28

when an agent reports on its own work, why believe it? you shouldn't. styxx.attestation: the agent makes claims, anyone re-verifies them against the real repo. the agent's verdict is never trusted — only the substrate. flip a verdict re-seal the hash → still caught. trust the substrate, not the agent.

514

Fathom Lab

Fathom Lab

@fathom_lab

May 28

Links: •pypi: pypi.org/project/styxx/ •source: github.com/fathom-lab/styxx •docs / site: styxx-org.netlify.app •research paper (zenodo): doi.org/10.5281/zenodo.19326… •telegram: t.me/STYXX_COMM •fathom lab: fathomlab-io.netlify.app

styxx

pypi.org

357

Fathom Lab

Fathom Lab

@fathom_lab

May 27

styxx 7.7.7 — DOI 10.5281/zenodo.20418532 the seven-method floor from last week's thread is now a pip-installable public challenge with CI-verified submissions. pip install styxx==7.7.7 styxx leaderboard --rows-only beat the floor or join it. doi.org/10.5281/zenodo.20418…

Fathom v24 / styxx v7.7.7: The Decorrelation Ceiling — seven-method empirical floor on reference-...

Headline: seven independent pre-registered methods on the consensus-hallucination dark core, all closed-negative within their corpus's scope. The synthesis stating the bimodal prediction was commit...

zenodo.org

412

Fathom Lab

Fathom Lab

@fathom_lab

May 27

today we shipped styxx 7.7.3. we did NOT crack ai sycophancy. we did NOT ship a deployable routing primitive. we did NOT "change the field." here's what we did do — and why the absence of those claims is the substance ⤵️

543

more replies

Fathom Lab

Fathom Lab

@fathom_lab

May 27

the benchmark is shipped. darkcore_benchmark.json — 108 labeled records, 4 classes (folklore, pseudoscience, factual-error, truth). the empirical floor (seven method-failures) is baked in as the bar. beat us. or replicate the floor.

213

Fathom Lab

Fathom Lab

@fathom_lab

May 27

install: pip install styxx==7.7.3 release: <github.com/fathom-lab/styxx/…> paper: <github.com/fathom-lab/styxx/…> the credibility isn't in the claim. it's in the discipline pattern. prereg → push → run-once → commit.

Release v7.7.3 — The Decorrelation Ceiling arc with receipts · fathom-lab/styxx

styxx 7.7.3 — the Decorrelation Ceiling arc with receipts A patch release that wraps a major research session: seven independent pre-registered methods on the dark-core hypothesis, all closed-negat...

github.com

498