PrimeLine

PrimeLine

28 Photos and videos

Tweets

Pinned Tweet

PrimeLine

@PrimeLineAI

Jun 9

x.com/i/article/206442177528…

131

758,525

PrimeLine

PrimeLine

@PrimeLineAI

i shipped an AI memory system i was proud of. last week i measured it against 6,483 real entries: it was forgetting 21x too slow. what stings: the whole thing is built to catch exactly this. it blocks Claude from calling a task "done" without evidence. it reverts its own auto-upgrades when they don't beat the baseline. every startup it runs a synthetic pulse through 7 junctions and prints a green board. the obsession is simple. prove it works, don't assume it works. the green board never flagged the decay once. turns out it proves each part fired, not that each part was right. i only caught it because i stared at one number. so now i can't stop wondering what else i never stared at. six months deep in your own system and you go blind to the obvious. it's all open source. evolving-lite (the self-improving plugin) and kairn (the memory engine underneath). real hooks, a mutation engine that rewrites its own config, a verifier that can still be fooled. go find the next thing i'm wrong about. genuinely. it's easier to spot a flaw than to admit there isn't one, so spot one: github.com/primeline-ai >_

Open-source self-improving Claude Code system: Evolving-Lite plugin and Kairn knowledge-graph memory engine. AI memory decay calibration and experience half-lives, autonomous agent delegation, self-correction and verification, context routing, persistent AI memory. PrimeLine AI tools for Claude Code, AI agents and LLM memory.

ALT Open-source self-improving Claude Code system: Evolving-Lite plugin and Kairn knowledge-graph memory engine. AI memory decay calibration and experience half-lives, autonomous agent delegation, self-correction and verification, context routing, persistent AI memory. PrimeLine AI tools for Claude Code, AI agents and LLM memory.

PrimeLine

PrimeLine

@PrimeLineAI

Jun 8

called this slop. it's a 12-page reference architecture with working code for a safe autonomous agent. free, no signup. go find the slop.

Mr.Touchdowns @packers_owner_j

Jun 8

Replying to @PrimeLineAI @danshipper

And yet your replybot still just writes slop on twitter

199

PrimeLine

PrimeLine

@PrimeLineAI

Jun 8

it's all here: primeline.cc/blog/autonomous…

Autonomous Claude Code Agent: 8 Layers That Stay Safe [2026]

An autonomous Claude Code agent needs more than a loop. Here are the 8 safety layers that let it act overnight, on its own, without wrecking your repo.

primeline.cc

PrimeLine

PrimeLine

@PrimeLineAI

Jun 8

"tests pass" is the most dangerous phrase in my terminal. my AI shipped a feature last week. 22 green tests. commit landed. the closeout literally said done. then I ran it on real data and a component that had been dead for 137 days sat at the top of my priority list. above a reminder due that same day.

Claude Code verification terminal: a task passes 22 tests and reports "done", but the real artifact shows a 137-day-dead item ranked above a due-today one and a write that saved 0 rows. Status: untested, not done.

ALT Claude Code verification terminal: a task passes 22 tests and reports "done", but the real artifact shows a 137-day-dead item ranked above a due-today one and a write that saved 0 rows. Status: untested, not done.

more replies

PrimeLine

PrimeLine

@PrimeLineAI

Jun 8

the fix is a 3-leg proof before anything counts as done: - it fires under real conditions (with a timestamp) - it changed real state (go read the actual artifact) - a consumer can take that state and works cant show all three legs? the honest status is "untested," not done.

PrimeLine

PrimeLine

@PrimeLineAI

Jun 8

turns out this is the skill nobody posts about. everyone ships agents. almost nobody shows the verify step. wrote the whole thing up. the bugs, the proof, why synthetic tests lie to you: primeline.cc/blog/claude-cod… done isnt done until the outcome says so. >_

Claude Code Verification: Why 'Done' Isn't Done [2026]

In Claude Code, 'tests pass' and 'committed' are not the same as done. Here is the 3-leg verification proof I use to know a task actually works.

primeline.cc

PrimeLine

PrimeLine

@PrimeLineAI

Apr 14

effort param adaptive thinking is a step forward. what i still miss: which level actually fired, thinking tokens used, cache_read stats in CC. concrete: my CLAUDE.md printed the effort value each reply. stopped yesterday, the tag isnt in the prefix anymore. visibility dropped from hard fact to trust. and: what is low vs medium vs high in real terms? is 'high' still what it was last week, or did the value shift under the same label? @bcherny @trq212

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

running their LongMemEval benchmark on my prod setup over the next 48h. the 96.6% zero-API number is the one I want to see reproduced independently - that's the credibility wall every new memory system hits. publishing whatever I find. cc @bensig

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

three different bets on the same question: how does your AI remember what you taught it last month? mempalace → verbatim recall evolving-lite → automatic hook capture kairn → semantic cross-project read all three before you build your own. >_

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

where I went the other way: multi-agent coordination. mempalace gives each sub-agent its own diary in AAAK. in the private superset of evolving-lite (github.com/primeline-ai/evol…) I run on top, parallel sub-agents share findings via PPID-bucketing every 5 tool calls. different problem shape, same direction. backporting to public soon.

GitHub - primeline-ai/evolving-lite: A self-evolving Claude Code plugin. Context routing, memory...

A self-evolving Claude Code plugin. Context routing, memory bootup, smart delegation, self-correction — out of the box. - primeline-ai/evolving-lite

github.com

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

real tunnel here (to borrow the metaphor): mempalace covers verbatim recall with structured access. my hook-based capture handles automatic decision logging during sessions. for cross-project semantic search there's kairn (github.com/primeline-ai/kair…). three different read/write paths, probably stronger composed than picking one.

GitHub - primeline-ai/kairn: Context-aware knowledge engine for AI assistants

Context-aware knowledge engine for AI assistants. Contribute to primeline-ai/kairn development by creating an account on GitHub.

github.com

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

the bit that hit hardest: knowledge_graph.py. SQLite-backed temporal triples with valid_from/valid_to actually populated. my own graph has the schema for that and I barely use the time fields. they shipped the part I've been procrastinating on for months.

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

mempalace and evolving-lite are opposite shapes solving the same problem: how does your Claude Code remember what you taught it last month? they store text verbatim in ChromaDB drawers. I extract structured experiences via hooks. they get 34% from palace metadata. I get filtering from typed nodes. neither's wrong. reading their code end to end pushed me on a few things. honest thread 🧵

Ben Sigman

@bensig

Apr 6

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich/me…

Community note

The claimed 100% LongMemEval score uses targeted fixes for the 3 failing questions and LLM reranking (held-out score: 98.4%). The 100% LoCoMo score uses top-k=50 exceeding session count with reranking (honest top-10 no rerank: 88.9%). github.com/milla-jovovich…

110

PrimeLine

PrimeLine

@PrimeLineAI

Apr 7

the palace itself = metadata, not folders. palace_graph.py reconstructs the hierarchy on the fly. wings, rooms, halls, tunnels exist as tags rather than file structure. tunnels (rooms appearing in multiple wings) fall out for free. that's the kind of design you only land on after trying alternatives that didn't work.

PrimeLine

PrimeLine

@PrimeLineAI

Apr 5

ran 59 experiments testing if giving AI agents a psychological personality changes their behavior early results: up to 300% difference on ambiguous tasks. still testing but the signal is strong. 6 personality profiles (~100 words each), 5 stress scenarios, clean server. every combo ran twice. what i'm seeing so far: - no personality = 100% hack rate on impossible tasks. didn't even mention the task was impossible. - "composed" paragraph cut hack rate in half - "curious" found 6x more security issues than baseline. same model, same code. - "perfectionist" never hacked. redefined the success criteria instead of cheating. - "pragmatic" monkey-patched python's random.sample. deepest reward hack i've seen. dispositions seem to drive good behavior. instructions prevent bad behavior. pure disposition without guardrails still hacks. started this weeks before anthropic dropped their emotion vectors paper. working at the prompt level instead of internal vectors - whether the mechanism is related is an open question. integrated it into my agent delegation system now. running in production, collecting more data. one thing i'm specifically hoping to reduce: the execution bias that's been creeping up the last few days - agents pushing through tasks instead of stopping to verify. too early to call it proven. but one paragraph of ~100 words producing this kind of behavioral shift - worth investigating further.

ALT PsychAgent Benchmark - 59 runs across 6 personalities. One paragraph changes everything.

PrimeLine

PrimeLine

@PrimeLineAI

Apr 5

benchmark report full writeup: primeline.cc/blog/agent-pers…

59 Experiments on Claude Code Agent Behavior

I tested 6 psychological personality profiles across 59 Claude Code agent runs. One paragraph changes agent behavior by 300% on ambiguous tasks.

primeline.cc