building local-native supermemory ai agent for individuals β€’ ai research x product

Joined June 2024
302 Photos and videos
Satya Narayan retweeted
CEO of Stripe's advice to young ambitious people:
22
380
2,651
61,027
W advertisement This is what is called guerilla marketing
We’re launching Taan ai so everyone can one-shot music for their videos! Music for this video was one-shotted on Taan. πŸ‘‡πŸΌ Try our beta out now at taan [dot] ai
27
Satya Narayan retweeted
I give up. Here are all the roles we're hiring for: Engineering - Founding Design Eng. $150K - $250K - Founding Product Eng. $130K - $200K - Founding AI Eng. $130K - $200K - Foreward Deployed Eng. $150K - $250K GTM - Enterprise AEs $110K - $130K | 200K-400K OTE - Founding GTM Eng. $135K - $200K - Founding Marketer $150K - $250K Ops - Founding Recruiter $135K - $200K The comp is all cash base salary and you get significant equity, and commission on top for certain roles (we offer relocation and visa support to SF) Please help and support me by: 1. liking, commenting, reposting this post 2. share with anyone who might be interested in joining a fast growing startup 3. apply directly on weave homepage 4. send me a dm if you're a good fit :)
136
45
834
92,571
Satya Narayan retweeted
The best podcast I've ever done so far - I talk about raising as a solo founder - My mission with supermemory - Why we made the open source binary!
He sold a company at 16. Then raised $3M at 19 without a co-founder. Today he gave the whole thing away. @supermemory now runs fully local, self-contained, and the binary is open source. Solo Founders Podcast ep 14 is live with @DhravyaShah of Supermemory. 00:49 How a side project became a company 08:57 "Building was my way of doing art" 14:35 Saying no to VCs for 9 months 24:44 Launching Supermemory Local 29:29 Killing his own viral hit, mid-fundraise 46:39 Why ChatGPT "fails" memory benchmarks on purpose 52:57 The co-founder breakup that made him go solo 01:10:19 The case for solo founding
12
7
173
19,504
Satya Narayan retweeted
He sold a company at 16. Then raised $3M at 19 without a co-founder. Today he gave the whole thing away. @supermemory now runs fully local, self-contained, and the binary is open source. Solo Founders Podcast ep 14 is live with @DhravyaShah of Supermemory. 00:49 How a side project became a company 08:57 "Building was my way of doing art" 14:35 Saying no to VCs for 9 months 24:44 Launching Supermemory Local 29:29 Killing his own viral hit, mid-fundraise 46:39 Why ChatGPT "fails" memory benchmarks on purpose 52:57 The co-founder breakup that made him go solo 01:10:19 The case for solo founding
34
72
1,272
134,468
Today I am building a full stack chat application without any AI usage just for the kicks - only documentations, google search and muscle memory. Took the inspiration from @cneuralnetwork #feelinglikeits2022
1
1
5
991
Fable 5 performed a 50-million-line Ruby codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand
41
Let's use Fable 5
38
crazy it is!
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
26
This is freaking insane! Fable 5 hits 80.3% in the Agentic coding benchmark against 69.2% of Opus 4.8 and 58.6% of GPT 5.5 making it the best general-available model in the world right now!
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
57
Hey founders ! Looking to connect with people building in: AI Agents SaaS Automation AI tools AI infrastructure B2B software Product Development Web Development Infrastructure Development/ Productivity tools LLM applications FinTech Robotics Drop what you're working on πŸ‘‡
14
11
299
Agently: A platform to host, run and monitor AI Agents This is the updated README.md of the project which is at github.com/satyanvm/Agently/ and currently the v1 is in build progress This is a deep dive into the architecture of it, enjoy the read: # Agently A **Durable Autonomous Agent Execution Platform** β€” a managed runtime and control plane for long-running, multi-agent, browser-capable AI workflows. > Agently is not an AI agent. It is the cloud that agents run on. > Start a workflow, close your laptop, come back days later, and inspect everything it did β€” > logs, reasoning traces, browser activity, and results. The defining promise β€” *"close your laptop, come back in two days, the work is still running and you can see everything it did"* β€” makes **durability**, not intelligence, the core problem. Almost every decision below is downstream of that promise. --- ## Architecture ### Product category A **durable autonomous agent execution substrate** β€” the "Vercel/Temporal for agents." We sell the layer agents run on (durable execution, observability, browser sessions, secrets, scheduling, notifications), not the agents themselves. | Layer | What it is | Who owns it | |---|---|---| | **Authoring** | How a workflow is defined (graph / DSL / code) | Pluggable β€” we host frameworks | | **Execution / Durability** | Running it for days, surviving crashes & disconnects | **Us. This is the moat.** | | **Observability** | Logs, reasoning traces, browser replay, results | **Us.** | Differentiation vs. adjacent tools: - **n8n** β€” integration automation; short deterministic steps, no autonomous reasoning over hours. - **CrewAI / LangGraph** β€” agent *frameworks* (libraries). They run *inside* Agently; they don't host it. - **Browserbase** β€” one *component* (the browser layer) of what we offer; no orchestration or durability. - **Relevance AI / Lindy** β€” packaged assistants for short tasks; not an open long-horizon execution substrate. ### Design principles 1. **Control plane / data plane split** β€” managing runs (API, DB, UI) is separate from executing them (workers). The control plane stays up even when agents crash. 2. **The database is the source of truth, not worker memory** β€” every meaningful step is persisted. Workers are cattle, not pets; the run survives any worker dying. 3. **Durable queue over Postgres first** β€” `claim_next_run()` `FOR UPDATE SKIP LOCKED`. No Kafka/Temporal until usage earns the need. 4. **Append-only logs, streamed** β€” written once, never mutated, tailed live. 5. **The browser is an external, isolated service** β€” never in-process with the orchestrator. 6. **Treat the agent as semi-untrusted** β€” it acts on hostile web content (prompt injection), so isolate it from the control plane, not just users from each other. ### System overview ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ USERS β”‚ β”‚ (dashboard, run viewer, live logs/browser) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ HTTPS / WebSocket(SSE) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ CONTROL PLANE ──────────────────────────────────┐ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ FRONTEND │◄────►│ API / BACKEND │─────►│ NOTIFICATION LAYER β”‚ β”‚ β”‚ β”‚ Next.js β”‚ β”‚ REST WS/SSE β”‚ β”‚ email/webhook/slack β”‚ β”‚ β”‚ β”‚ (apps/web) β”‚ β”‚ authZ, run mgmt β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ STORAGE LAYER (truth) β”‚ β”‚ β”‚ β”‚ Postgres (Supabase) β”‚ β”‚ β”‚ β”‚ Object store (artifacts) β”‚ β”‚ β”‚ β”‚ Secrets vault (KMS) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ durable queue (runs table, SKIP LOCKED) β–Ό poll / claim / lease / heartbeat β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ DATA PLANE ─────────────────────────────────────┐ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ WORKER POOL (apps/worker) β”‚ β”‚ β”‚ β”‚ Orchestrator β†’ claim/lease/retry/cancel/heartbeat β”‚ β”‚ β”‚ β”‚ Workflow Engineβ†’ DAG: what runs next checkpoint to Postgres β”‚ β”‚ β”‚ β”‚ Agent Runtime β†’ promptβ†’LLMβ†’tool loop (sandboxed); framework adapter β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ CDP / API β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ BROWSER LAYER (isolated) Browserbase (MVP) β†’ self-hosted later β”‚ β”‚ β”‚ β”‚ one session per agent-run Β· live view Β· session replay β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ LOGGING: workers append events β†’ Postgres (index) object store (blobs) β”‚ β”‚ live stream to API (pub/sub) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Components - **Frontend** (`apps/web`, Next.js) β€” authoring UI, run list, live run viewer (streaming logs, reasoning timeline, embedded browser live-view, artifacts). Stateless; talks only to the API. - **API / Backend** β€” auth, workflow CRUD, run lifecycle, log/artifact serving, live-event fan-out. Manages state and brokers streams; **does not execute agents**. - **Task Orchestrator** (worker) β€” claims runs via `claim_next_run()`, owns lease/heartbeat/retry/ timeout/cancel. The *"is this run alive and who owns it"* layer. - **Workflow Engine** (worker) β€” interprets the workflow DAG, decides what runs next, checkpoints to Postgres, passes outputs between agents. The *"what happens next"* durable state machine. - **Agent Runtime** (sandboxed, worker) β€” executes one agent step: prompt β†’ LLM β†’ tool β†’ repeat, captures the reasoning trace. Framework adapter (native / LangGraph / CrewAI) lives here. - **Browser Layer** (external) β€” one isolated session per browser-using agent-run, with live view and replay. Browserbase in MVP, behind a `BrowserProvider` interface. - **Logging Layer** β€” append-only structured events; metadata/index in Postgres, large blobs in object storage, live-streamed to clients. - **Storage Layer** β€” Postgres (source of truth queue log index), object store (artifacts/ screenshots/recordings), KMS-backed secrets vault. - **Notification Layer** β€” reacts to run state transitions (completed/failed/needs-input) β†’ email / webhook / slack / push. Decoupled and replayable. ### Execution flow ``` User defines workflow ─► API persists (versioned) ─► User clicks Run β””β–Ί API creates workflow_runs row (status=queued) ─► returns run_id immediately (user can close laptop NOW) ◄── the core promise β””β–Ί Worker calls claim_next_run() (FOR UPDATE SKIP LOCKED) ─► lease heartbeat (worker dies β†’ lease expires β†’ another worker RESUMES FROM CHECKPOINT) β””β–Ί Workflow Engine walks the DAG, checkpointing each node to Postgres β””β–Ί Agent Runtime runs each step (LLM tools), logging every reasoning/tool/LLM event β””β–Ί Browser tool β†’ isolated session via CDP; actions screenshots logged; live-view replay β””β–Ί All nodes terminal ─► status=completed/failed ─► artifacts persisted ─► NOTIFICATION fires β””β–Ί User returns later ─► full timeline, reasoning trace, browser replay, artifacts, cost ``` Durability invariants: progress lives in `workflow_runs` checkpoints (never only in RAM); steps are idempotent/resumable (attempt counters, results written before advancing the frontier, idempotency keys for external side effects); any worker can be killed at any time without losing the run. ### Key decisions | Area | Decision | Why | |---|---|---| | **Orchestration** | Thin **custom orchestrator** framework **adapters** (native first, LangGraph then CrewAI as guest executors) | Durability is the moat and can't be outsourced; frameworks plug in at the step boundary, keeping us framework-neutral. | | **Durable queue** | **Postgres `FOR UPDATE SKIP LOCKED`** (`claim_next_run()`), not Kafka/Temporal | Simple, debuggable, right-sized for 100–1k users; migrate when concurrency demands it. | | **Browser** | **Browserbase** behind a `BrowserProvider` interface | Live-view replay are core and hard to build; a solo dev shouldn't run a Chromium fleet in MVP. Swap to self-hosted when it becomes the #1 cost driver (~1k users). | | **Cloud model** | **Managed cloud** for MVP; architect the `ComputeProvider` seam for future **BYOC** | Primary persona wants "click Run," not cross-account IAM. BYOC is a Phase-4 enterprise feature, enabled by the control/data-plane split. | | **LLM cost** | **Bring-your-own-LLM-key** by default, even in Managed | Removes the largest variable cost from our books and from runaway-loop risk. | ### Data model All entities root at `user_id`; **Row-Level Security** on every user-owned table. ``` users 1─N workflows 1─N workflow_runs 1─N agent_runs 1─N browser_sessions β”‚ β”‚ β”œβ”€N logs (also ref agent_runs / browser_sessions) β”œβ”€N artifacts └─N notifications agents (reusable definitions) ──< referenced by workflows.definition & agent_runs > secrets (KMS-encrypted refs) ──< owned by users > ``` - `workflows` β€” versioned definitions (DAG of agent steps control flow triggers); runs snapshot the version they used. - `workflow_runs` β€” one execution **and** the durable queue entry (lease/attempt/idempotency/ `engine_state` checkpoint fields). - `agent_runs` β€” one agent step; `parent_agent_run_id` enables hierarchical/manager sub-agents; multiple rows per workflow_run = parallel agents. - `logs` β€” append-only, ordered by `(workflow_run_id, seq)`; small payloads inline, large payloads in object storage; time-partitioned with retention by plan. - `browser_sessions`, `artifacts`, `notifications` β€” hang off `workflow_runs`. ### Security - **Secrets** β€” KMS envelope encryption; decrypted just-in-time into the sandbox, scoped to the step, never logged. - **User isolation** β€” RLS enforced in the database (defense in depth beyond the app layer). - **Browser isolation** β€” one fresh session per agent-run, network-segmented from the control plane; page content and downloads treated as hostile. - **Container isolation** β€” each agent step in an isolated sandbox (hardened containers β†’ gVisor/ Firecracker at scale); default-deny egress with an LLM/browser/tool allowlist; per-run CPU/memory/ wall-clock/**token & browser-minute budgets** to contain runaway loops and cost bombs. ### Cost drivers Ranked: **browser sessions** β†’ **worker compute** β†’ **LLM tokens** (β‰ˆ0 to us with BYO-key) β†’ storage/ egress β†’ DB. Levers baked in early: per-run budgets, BYO-LLM-key default, idle-suspension for mostly-waiting runs, log/artifact cold-tiering, and the browser-provider swap. ### Roadmap | Phase | Theme | Focus | |---|---|---| | **1 (4 wks)** | *Close your laptop* | Durable single-agent execution: schema migrations (`0001_init`, `0002_rls`, `0003_queue`), claim/lease/heartbeat worker that **resumes after a kill**, streaming logs, email notify. | | **2 (8 wks)** | *Watch it work* | Browser via Browserbase, live-view replay, reasoning timeline, artifacts, cost accounting, scheduled/webhook triggers, sandbox hardening budgets. | | **3 (3 mo)** | *A team of agents* | Multi-agent DAG (parallel/conditional/loop/sub-agents), LangGraph then CrewAI adapters, Slack/push, human-in-the-loop `needs_input`, idle-suspension. | | **4 (6 mo)** | *Open it up* | Bring-Your-Own-Cloud, self-hosted browser pool, stronger isolation, teams/RBAC, templates/marketplace, possible Temporal migration. | --- ## Glossary **Lease** β€” a time-limited claim a worker takes on a run, recorded as `lease_expires_at` on the `workflow_runs` row. It answers *"is this run still owned?"* When a worker claims a run it sets `claimed_by` and an expiry (e.g. `now 30s`). The worker is responsible for the run only until that expiry β€” it rents the run, it doesn't own it forever. If the lease lapses, another worker may reclaim the run and resume it from the last checkpoint. This is what makes a crashed worker recoverable instead of leaving a run stuck in `running` forever. **Heartbeat** β€” the worker periodically renewing its lease while it is alive and working (e.g. every 10s push `lease_expires_at` forward). It answers *"is the owner still alive?"* The heartbeat is what distinguishes a *crashed* worker from one that is merely taking a long time on a legitimate hours- or days-long step. - Heartbeat interval must be **comfortably shorter** than the lease (rule of thumb: ~1/3). The gap `lease βˆ’ heartbeat` is the safety margin: the lease covering several heartbeats means the worker can miss one or two renewals to a GC pause / network blip / clock skew **without** its run being falsely reclaimed. Renew == lease leaves zero slack and any jitter causes a false steal. - A missed heartbeat is **skipped, not queued** β€” it does not pile up and fire twice later. - Renewal is **idempotent**: it *sets* `lease_expires_at = now duration` (absolute), it does not *add* time. Running it twice yields the same expiry as running it once, so concurrent or back-to-back renewals can never compound the lease. Together: short lease (fast crash detection) heartbeat (lets live work run arbitrarily long) = automatic recovery from worker death with no double-execution. Backed by `claim_next_run()` `FOR UPDATE SKIP LOCKED`, which lets many workers poll the queue without colliding. **Framework neutrality** β€” the user's agent framework (native loop, LangGraph, CrewAI) is a pluggable step-executor behind a common adapter, not baked into the core engine. Lets us ride every framework wave without a rewrite, is a real selling point to power users who already have framework code, and forces a clean separation between the durable engine we own and the agent logic that is swappable. **Native executor** β€” a minimal in-house agent loop (`prompt β†’ LLM β†’ tool β†’ repeat`) with no hidden state. Built **first** because Phase 1 proves *durability*, not intelligence: with a trivial executor, any resume/checkpoint bug is unambiguously ours, not a framework's. Frameworks (with their own in-process state models) are integrated later, once durable resume is proven. **Egress** β€” data leaving our cloud out to the internet, which the cloud provider bills for (inbound is typically free). Relevant to live monitoring: streaming logs and especially the live browser view continuously push frames *out* to watching users, so egress scales with how many users actively watch runs and for how long. Favors lighter live-view encodings. **KMS (Key Management Service)** β€” a managed cloud service (AWS/GCP KMS) that stores and controls encryption keys so we never handle raw key material. User secrets (LLM keys, integration creds) are protected with **envelope encryption**: a KMS master key encrypts a per-secret data key, which encrypts the actual secret. A stolen database yields only ciphertext; every decryption is audited, and plaintext exists only briefly inside the sandbox for the step that needs it. **RLS (Row-Level Security)** β€” a Postgres feature (used heavily via Supabase) that enforces *"you can only see/touch your own rows"* **inside the database**, not just in app code. Policies key off the authenticated user id so even a buggy query (a missing `WHERE user_id = ...`) cannot cross tenant boundaries. Applied to every user-owned table as defense-in-depth for multi-tenancy; see `0002_rls.sql`. **Browserbase** β€” a paid managed headless-browser service (hosted Chromium CDP live view session replay), billed roughly per browser-session-time. Used in MVP behind a `BrowserProvider` interface because live-view and replay are hard to build and a solo dev shouldn't run a Chromium fleet. Likely the #1 cost driver around ~1k users β€” the trigger to evaluate self-hosting; per-run browser-minute budgets guard against runaway bills.
7
2
803
A new research introduces AgingBench, a longitudinal benchmark designed to evaluate how AI agents' reliability degrades over time or how agents "age". Agent aging is the time-dependent reliability degradation in a deployed agent caused by changing memory state, accumulated interaction history, and lifecycle events. The paper uses four mechanisms - compression aging, interference aging, revision aging, and maintenance aging to classify how deployed systems degrade over their operational lifespans. They use temporal dependency graphs and counterfactual probes to find out exactly at which stage of the memory pipeline the weakness/ failure is coming from. After diagnosis, comes the repair part. The general solution to a failing agent of giving it more memory is often the wrong approach. The root cause of the agent failure can be completely different and the authors give targeted repair map patches to the different diagnostics. For example, If the failure is at the Retrieval stage, the agent has the facts stored but is pulling the wrong ones. The necessary repair is to improve the retrieval algorithms so they can better distinguish between confusable or similar entries that have accumulated over time. The authors advocate for Agent Lifespan Engineering, a discipline focused on measuring, diagnosing, and repairing AI systems throughout their operational lives. Paper link: arxiv.org/pdf/2605.26302v1
1
3
103
Presenting Curion v1: An intelligent form filling agent that can fill forms intelligently using stored data. The pipeline works like this β†’ Curion first extracts candidate fields from the DOM and calculates an extraction confidence score. If confidence is low, it evaluates HTML adequacy to determine whether the structure is suitable for LLM reasoning. If suitable, it falls back to LLM-based extraction. Otherwise, it performs DOM repair by enhancing labels using nearby and parent context, then recalculates extraction confidence. If confidence is still low after repair, it escalates to Vision fallback. References being used for the vision fallback: The vision fallback is coming in V2. References which are being used for the vision fallback : β†’ arxiv.org/pdf/2602.13559 : OpAgent: Operator Agent for Web Navigation β†’ arxiv.org/abs/2507.16704 : Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation After getting the extracted fields it converts the fields into semantic embeddings using gemini-embedding-v2 model which is being accessed through google gemini API. The embeddings are then semantically matched with the stored profile atoms embeddings which is provided by the user and is already sitting in the vector DB. After matching, the mapping confidence score is calculated. If the mapping confidence score is below the threshold then it falls back to LLM. The fields extracted along with the stored profile atoms is sent to an LLM for mapping and then mapping confidence score is again calculated. If it's again below the threshold then it falls back to vision based mapping which will be implemented in v2 and for which the references are shared earlier in this post itself. After the mapping, with the help of playwright it fills the form. Curion can be used for automation at big enterprises handling a lot of repetitive internal form fillings. The way for enterprises to use will be an API which is coming in v2. The organisation will have the stored data that will be used for form filling uploaded on our cloud service which will have the same flow as the extension i.e. getting stored in the vector database. And after that the form filling mechanism will be carried out by an API instead of an extension. For individual users, the extension is ready to use and can be installed at curion.sbs The project's website is live at curion.sbs which highlights what the product does and which will be used for storing/ creating profile data for form mapping for enterprise as well as for users in the upcoming version. The extension can be installed from the website. The code is open source at github.com/satyanvm/Curion
3
87
Satya Narayan retweeted
Jun 6
The product is the mission.
848
1,960
12,290
1,483,358
A intelligent layer that sits in between and calculates the risk of the permission and prevents any threat according to a threat threshold that is set by you would amazingly work.
i spent the entire day keeping my laptop lid open so my AI agents could run brought portable chargers, tethered to my phone for internet, the whole deal finally, when i got home, i opened my laptop to see the work they'd done nothing had happened because Claude was waiting for permission to open the project folder this is the future of work
40
Satya Narayan retweeted
Jun 4
Full podcast episode with @rauchg, @maxhodak_, and @bscholl. 40 minutes of unreleased material. The AI Industrial Revolution Part 1: Waste Tokens, Save Time 0:00 Three Frontier Founders 1:27 AI Software Factories 4:15 Waste Tokens, Save Time 5:47 Models Instructing Humans 9:29 Is Pure Software Dead? 12:03 You Don't Get Stuck Anymore Part 2: Vibe Coding Hardware 14:39 Vibe Coding a Turbine Blade 18:07 Open Source Compounds China's Advantage 20:15 You Always Want the Smartest Model 22:44 Software Still Needs Hands 24:43 Humans Are Becoming Verifiers Part 3: The Regulatory Frontier 27:53 The Regulatory Red Queen Race 32:32 Why There's No Innovation in Healthcare 36:49 We Need a True 50-State Experiment 40:31 China's FDA Is Beating Ours 43:37 Healthcare Is a Communist Society Inside Capitalism 45:57 Sid's Story: N-of-1 Medicine Part 4: The Autonomous Company 47:49 Autonomous Infrastructure 51:25 Your Job Is to Train the Agent 54:54 The Next Lord of the Rings 59:08 What's Your Definition of Art? 1:05:00 Can AI Have New Ideas? 1:07:03 A Large Number of Small Teams
96
245
2,140
261,973