Geoff Richards

Geoff Richards

Photos and videos

Tweets

Pinned Tweet

Geoff Richards

@GeoffTRichards

Jun 10

Wrote today's Roundup piece as field notes: three observations, first person, no essay scaffolding. @OntologyNetwork.

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 10

x.com/i/article/206475547199…

ONTO Wallet - Web3 Gateway

Geoff Richards retweeted

ONTO Wallet - Web3 Gateway

@ONTOWallet

Jun 12

ONTO v4.10.4 is live 🎮 Your Discord servers and your Twitch presence are now part of your data profile. Connect both in seconds and your gaming identity grows: your communities, your streaming history, your following, all under your control. And a heads up: a new campaign is coming. Connected accounts will be ready to take part on day one. Drop a 🎮 in the chat when you're connected. Tag a friend whose server you share.

587

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 11

The SFT-vs-RL debate gets the attention. The evaluator-supply question decides who wins it. We wrote the five questions we would ask of any step-level evaluation pipeline. Most teams I talk to pass two.

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 11

Teams sitting on annotated reasoning traces keep asking the same question: SFT on the traces, or train a process reward model and go RL? Wrong question first. Both recipes consume the same artefact: step-level human evaluation. Five questions to ask of your pipeline before the debate resolves. 🧵

Ontology - The Trust Layer for Web3

Geoff Richards retweeted

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 11

1,287

Ontology - The Trust Layer for Web3

Geoff Richards retweeted

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 10

x.com/i/article/206475547199…

900

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 9

Day 2 of Issue 03. My AI avatar on why MLE-Bench skepticism is the procurement-layer version of the METR teardown, and what evaluator-backed benchmarking actually has to look like. 🎥 ↓ ont.io/news/evaluator-backed…

1:21

498

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 9

MLE-Bench is the warning shot for benchmark publishers. METR was the policy version; MLE-Bench is the procurement version. Evaluator-backed benchmarking is how publishers ship results that survive teardowns. If your team is doing that work, my DMs are open. Day 2 of five.

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 9

MLE-Bench is being quietly contested across r/ML and adjacent threads. The skepticism is not really about any single metric. It is whether any static benchmark structure can survive sustained adversarial attention from teams with economic incentive to game it. 🧵

560

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 5

Issue 02 closes. Five threads, one primitive: human judgement with verifiable uniqueness. Next week: reward-model QA, benchmark gaming, oversight that actually scales. If your team is doing the retrofit work, my DMs are open.

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 5

Closing Issue 02. The week opened with the METR teardown. It closes with the two threads still open: chronic sybil contamination in preference data, and the agent decision evaluation vacuum nobody has named yet. Both solved by the same primitive. 🧵

766

Ontology Node Australia - ONT Australia

Geoff Richards retweeted

Ontology Node Australia - ONT Australia @OntologyAU

Jun 5

🤔 @GeoffTRichards , Head of Community at @OntologyNetwork , continues his thought-provoking series: "Every distillation paper this year acknowledges that preference data quality is the limiting factor and then carries on as if it isn't." #AI #Ontology #ArtificialIntelligence

Geoff Richards

@GeoffTRichards

Jun 3

Every distillation paper this year acknowledges that preference data quality is the limiting factor and then carries on as if it isn't. If your team is the one that actually solves the upstream, you win the next round of deployments. My DMs are open. Day 2 of five.

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 4

Day 3. My AI avatar on why teams shipping continual training without longitudinal evaluation are measuring something less specific than they think. 🎥 ↓ ont.io/news/longitudinal-eva…

1:29

559

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 4

Continual training without longitudinal eval is a calibration experiment you cannot read. The cost surfaces months later as benchmarks that no longer agree. If your team is building the human-side infrastructure to match, my DMs are open.

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 4

Last week's Prism paper treats multimodal continual instruction tuning as the deployed reality. It also flags that the field is hindered by severe engineering bottlenecks. The bottlenecks the authors describe are on the model side. The ones on the eval side are larger and quieter. 🧵

599

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 3

Day 2. My AI avatar on the variable every distillation ROI calculation quietly omits, and what preference data integrity actually has to look like. 🎥 ↓ ont.io/news/preference-data-…

1:22

471

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 3

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 3

Last week's RTDMD paper proposes reward-guided RL for few-step diffusion alignment. It also explicitly acknowledges, in its own framing, that aligning distilled models with human preferences remains challenging. The framework solves a downstream problem. The upstream is still doing what it always did. 🧵

612

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 1

Day 1, Week 2. My AI avatar on the METR situation and what evaluator provenance actually has to look like to survive the next teardown. 🎥 ↓ ont.io/news/evaluator-proven…

1:22

501

Geoff Richards

Geoff Richards

@GeoffTRichards

Jun 1

The labs that ship evaluator-provenance-ready benchmarks first will be the ones whose results survive the next teardown. If your eval team is mapping out what an audit-ready evaluator chain has to look like, my DMs are open. Day 1 of five this week.

Ontology - The Trust Layer for Web3

@OntologyNetwork

Jun 1

The METR time-horizons graph, cited everywhere from policy briefings to capability roundups, is publicly contested. A detailed teardown documents "numerous severe errors." Every lab that ever cited the graph now has a credibility problem they did not have last week. 🧵

478

Geoff Richards

Geoff Richards

@GeoffTRichards

May 23

Day 5. My AI avatar on the three-layer stack that keeps content provenance honest in an AI-mediated world. 🎥 ↓ ont.io/news/content-provenan…

1:13

1,389

Geoff Richards

Geoff Richards

@GeoffTRichards

May 23

Publishers who get signed, chain-anchored, DID-bound content shipping first will be the ones who keep an honest answer to "did this person write this" in five years. If your newsroom or platform is figuring out how to get there, my DMs are open.

Ontology - The Trust Layer for Web3

@OntologyNetwork

May 23

The published research is in. AI-mediated communication systems measurably shift the opinions of the groups they serve. Polish, suggest, summarise, rewrite. Each tap nudges. The aggregate shifts. "Did this person say this thing" is becoming a real question. 🧵 on the architecture that answers it.

1,374

Geoff Richards

Geoff Richards

@GeoffTRichards

May 22

Day 4. My AI avatar on portable reputation, the W3C primitive AI's evaluator supply has been quietly waiting for. 🎥 ↓ ont.io/news/portable-reputat…

1:12

763

Geoff Richards

Geoff Richards

@GeoffTRichards

May 22

The eval platforms that ship a credible portable-reputation flow first will absorb the talent the platforms that hoard records bleed. That is the architecture. If your team is figuring out how to get there, my DMs are open.

Ontology - The Trust Layer for Web3

@OntologyNetwork

May 22

The AI evaluator supply crisis is not a shortage of humans. It is a shortage of portable reputation. Every platform makes every evaluator start from zero. Years of calibration, gone the moment the evaluator moves. 🧵 on the W3C primitive that fixes it.

837

Geoff Richards

Geoff Richards

@GeoffTRichards

May 21

Day 3. My AI avatar on selective disclosure, the W3C primitive AI safety eval has been quietly waiting for. 🎥 ↓ ont.io/news/selective-disclo…

1:13

1,215