Zaxy's graph stays clean through a layered set of mechanisms, and the most important one is architectural rather than janitorial:
1. The graph is a disposable projection, not the store. Eventloom (the append-only, hash-chained log) is the source of truth; the embedded Kuzu graph is a deterministic projection of it. That means the graph can always be discarded and rebuilt by replay (zaxy reproject). Cruft, drift, or schema evolution never require surgery on the graph. You rebuild it from the log with current extraction logic. Cleanliness is reproducible rather than maintained by hand.
2. Temporal validity instead of mutation. Entities carry valid_from/valid_to windows. When a fact is superseded or invalidated (memory_invalidate), the old version's validity window is closed. Nothing is deleted, but default queries read only active (current-state) entities, so superseded versions don't clutter retrieval. History stays available for temporal/as-of queries without polluting the present.
3. Extraction policy gates what becomes a node at all. Deterministic extractors decide which event kinds project into entities. Bookkeeping events, hook lifecycle markers, the new memory.reinforcement salience events, FoK calibration markers, have deliberately empty extractors registered, so servability traffic never creates graph entities or perturbs
BM25/verbatim statistics.
4. Review-gated consolidation compacts episodic sprawl. The consolidation pipeline turns runs of fine-grained events into cited, review-pending candidates (episodes, claims, procedures). Only accepted candidates become compact abstractions, and the originals remain in the log rather than the working graph's hot path. The geometry-aware consolidation work adds safety auditing around this (identity invariants, compaction risk checks).
5. Projection-level forgetting (new this release). The salience ledger attenuates memories that decay below a floor. They leave default checkout ranking (while staying reachable by explicit query and fully storable, since salience is replayed from the log). The opt-in encoding gate classifies appends as novel/reinforcing/redundant, so duplicates project as
reinforcement signals on existing entities rather than accumulating as new ranked content.
6. Bounded derived state. The performance-sensitive structures over the graph — vector index caches (LRU 256 MiB byte budget), adjacency snapshots, traversal indexes — are signature-invalidated and budget-evicted, and embeddings are version-tagged with a batch re-embed migration path, so stale derived state can't silently accumulate or mix incompatible versions.
The raw event log itself only grows — that's by design (it's the audit record). "Clean" in Zaxy means the queryable surfaces stay current, compact, and uncontaminated, while the full history remains intact underneath.