Wax: One File That Survives Hard Kills and Context Resets

Agents lose state. They accumulate decisions and facts across hours of work, then the context window rolls over or the process dies and that history is gone. The fallback is stuffing the full transcript back into the prompt every turn. Tokens burn. The one fact from three sessions ago that the next decision depends on is still missing.

I tried the usual stack. JSONL logs plus a vector store. A memory service that runs LLM extraction every turn and writes into Postgres, pgvector, and a graph. Hard kills corrupted the file. The sidecar SQLite drifted from the vectors. The cloud path added latency and a hard dependency. Dedup stayed a best-effort LLM call that sometimes emitted three versions of the same preference.

I wanted one file. AirDrop it, back it up, open it after a hard kill, and recover the state that was in flight: pending WAL records, vector index manifests, FTS5 segments, bitemporal facts with valid-time and system-time ranges. No network hop. No sidecar database that can diverge. No hand-wave about eventual consistency.

Wax is that file format and the engine that owns it.

The file format

A Wax file stores its own write-ahead log. It is not a bag of documents with a separate index file next to it. The layout is self-describing.

Offset          Region
0 KiB           Header Page A (4 KiB)
4 KiB           Header Page B (4 KiB)
8 KiB           WAL ring buffer (default 256 MiB)
...             Frame payloads (variable, compressed)
...             TOC (BinaryEncoder, up to 64 MiB)
...             Footer (64 bytes)

Two identical header pages sit at the front so a crash mid-header-write leaves the previous valid generation intact. On open the engine reads both, validates checksums (SHA-256 of the header with the checksum field zeroed), and keeps the one with the highest headerPageGeneration. The footer holds the magic WAX1FOOT, the TOC length, a SHA-256 of the TOC body, a generation, and the WAL committed sequence number. The TOC hash in the footer must match the TOC’s own self-checksum.

Every frame carries a FrameMeta record: id, timestamps, uri, title, payload offset and length, canonical encoding, checksums, kind, track, tags, labels, metadata dictionary, search text, content dates, role, parent id, supersede links, chunk index, and the active flag.

The engine writes payloads into the WAL first. The orchestrator stages FTS5 segments, vector manifests, and temporal manifests, then serializes them into the TOC segment catalog only on commit. Commit advances the header generation and the WAL checkpoint.

Hybrid search, RAG assembly, structured memory, and VideoRAG all run against the same frame records. Those records stay intact after abrupt termination because the WAL and dual-header protocol enforce it.

WAL ring and recovery

The WAL is a fixed-size circular buffer starting at offset 8 KiB. Each record has a 48-byte header: 8-byte monotonic sequence, 4-byte payload length, 4-byte flags (bit 0 = padding), and 32-byte SHA-256 of the payload.

Record types are putFrame, deleteFrame, supersedeFrame, and putEmbedding. When the ring wraps, padding records fill the gap. A sentinel (all-zero header) marks the end of valid data.

Fsync policy is configurable: always, on commit (default), or every N bytes. A proactive commit threshold bounds how much data sits at risk in the WAL.

Recovery is the part most local vector stores skip:

Select the best header page by generation and checksum.
If a WAL replay snapshot exists in the header (magic WALSNAP1), use its cursors to skip already-committed records.
Scan forward from the checkpoint (or snapshot position) with WALRingReader.
Apply the pending mutations to rebuild in-memory indexes and frame state.
Tolerate corrupted pending records for position tracking; still apply later valid records.
Validate checksums on committed payloads.

I test this path with deliberate fault injection: append uncommitted WAL records and crash before updating the header or footer, stomp individual header pages with zeros, corrupt checksums in pending records, truncate the file at random offsets. In recoverable cases the store either replays the pending work or falls back to the last known good state. Unrecoverable corruption (bad magic, mismatched TOC hash on committed data) fails with a clear error instead of garbage.

The recovery contract is fixed. The WAL ring, dual header pages, and footer scanner produce a deterministic state after a hard kill. Append-heavy writes, occasional correction via supersede, and frequent hybrid queries shaped every layout choice.

Concurrency model

Every major piece is an actor: the low-level Wax coordinator in WaxCore, WaxSession, MemoryOrchestrator, FTS5SearchEngine, the vector engines, the embedding memoizer.

Actors isolate state. The hard part is I/O. Blocking POSIX calls on the actor thread pool starve everything else, so a BlockingIOExecutor owns a dedicated concurrent DispatchQueue. Reads run concurrent through it. Writes go through a barrier. All file operations (FDFile, pread/pwrite/fsync) funnel there.

On top of that sit purpose-built locks: AsyncReadWriteLock (actor + continuations, writer priority), ReadWriteLock (pthread for hot paths where async overhead hurts), AsyncMutex, UnfairLock, and FileLock (flock advisory with upgrade/downgrade for cross-process safety).

A writer lease (.fail, .wait, or .timeout) serializes logical mutations. Multiple readers are allowed. The lease is what lets a long-running recall overlap background ingest without two writers colliding inside the WAL and TOC.

The caller sees ordinary method calls. The durability invariants live underneath: BlockingIOExecutor, the writer lease, the WAL append path, the commit barrier, and the index updates.

Frames, supersede, dedup, and bitemporal memory

A frame is the durable unit. remember chunks text (default 400 tokens with 40-token overlap using the pinned BPE tokenizer), embeds when configured, writes one or more frames with parent/child links, indexes them in FTS5 and the vector engine, and commits in batches.

Supersede corrects earlier knowledge without losing history. Frame A can point to frame B as its replacement. Timeline queries and most recall paths exclude superseded frames unless you ask.

Content dedup is content-addressed at the document level via ContentHasher (SHA-256 of canonical form) plus embedding identity and chunk count, stored in metadata under wax.content.hash. The orchestrator exposes rememberDedupProbe so a partial ingest can retry without re-chunking or re-embedding the parts that already landed.

Structured memory lives in the same file as a bitemporal entity-fact model. Entities have kinds and normalized aliases. Facts are (subject, predicate, object) where object can be string, int, double, bool, data, timestamp, or entity reference. Every fact carries two half-open time ranges: valid time (when the fact is true in the world) and system time (when the system recorded it). Queries are asOf(systemTime, validTime). Retraction closes the system range. Evidence links every fact back to the exact frame, chunk, and UTF-8 span that produced it. Dedup is by SHA-256 of the triple.

Structured facts share the same FTS5 engine, WAL append path, and commit protocol as every other frame. Asserting “the user switched to dark mode on 2026-03-12” with its source span produces a bitemporal row whose evidence pointer targets that chunk and UTF-8 range. It survives the same WAL replay and header selection as the transcript frames that produced it.

Hybrid search and token-aware RAG

Search has four lanes: BM25 (FTS5), vector (Accelerate or MetalANNS), structured facts, and pure timeline fallback.

A small offline RuleBasedQueryClassifier looks at surface cues (“when”, “recent”, “what is”, “how”, “why”) and picks a QueryType. That drives AdaptiveFusionConfig weights for reciprocal rank fusion. Fusion is rank-based with a k=60 constant, weighted per lane, then tie-broken by best rank then frame id.

After fusion the pipeline can rerank. Term overlap, entity coverage, date bonuses, and distractor penalties (words such as “tentative” or “draft” in the wrong context) adjust order.

The token budget lives in the same orchestrator. The top hit expands up to a cap. Tier-selected surrogates fill the rest: full text, gist, or an MMR sentence-extractive summary. Tier choice uses age, access statistics, and query specificity.

Metadata and time predicates push down. Pending writes are visible on request. A caller can ask for frames tagged “meeting” after last Tuesday and receive only those results.

Compared to Mem0 and Zep

Mem0 runs LLM fact extraction on each turn and writes the results into a vector store plus an optional Postgres graph. Extraction cost and durability follow whatever backing stores you pick. No single file holds the raw transcript and the derived facts with a built-in replay path.

Zep (via Graphiti) stores bi-temporal facts and episodes in a graph engine such as Neo4j together with vector indexes. Both the graph and the vector store are external systems you deploy and administer.

Wax records every mutation first in the WAL ring at offset 8 KiB. On commit it advances the header generation, updates the footer with the TOC SHA-256 and the committed WAL sequence, and persists frame payloads, FTS5 segments, vector manifests, and structured fact rows inside the same file. Recovery selects the higher-generation valid header page, then uses WALRingReader to replay any records after the checkpoint. The same substrate serves raw frames, lexical search, vector search, and bitemporal facts. VideoRAG and PhotoRAG are additional orchestrators on that substrate.

A .wax file carries structured facts with their evidence pointers and the serialized vector index manifests. Open it on another machine and the engine replays the WAL from the committed sequence and rebuilds in-memory indexes from the TOC. Most agent memory systems are not built around a single portable file with its own recovery protocol.

Practical outcomes

A coding agent keeps the architectural decision from three weeks ago and the file that recorded the constraint. The frame that captured the discussion carries its timestamp, supersede links if the decision changed, and the FTS5 + vector entries that surface it on a recall for “why did we pick this approach”.

A personal knowledge base stays in Documents. iCloud or Git syncs the one file. A query for last month’s HNSW conclusion pulls the paragraph, the surrounding chunk, the bitemporal fact that recorded the conclusion date, and the source span that produced it. No service call.

An iOS app bundles the orchestrator and the on-device embedder. The RAG path never leaves the device. The privacy surface for memory is the file the user already controls.

Platform constraints drove the design: iOS 17+, macOS 14+, Swift 6 actors, CoreML or Accelerate embedders. The engine had to match the hardware limits and the concurrency model of on-device inference.

Implementation notes

These structures make the durability guarantees concrete.

WAL record header

Every change lands first as a small, checksummed record:

struct WALRecordHeader {
    let sequence: UInt64        // Monotonically increasing
    let payloadLength: UInt32
    let flags: UInt32           // bit 0 = padding record
    let checksum: Data          // 32-byte SHA-256 of the payload
}

The full record is 48 bytes. After the header comes the payload (a put, delete, supersede, or embedding operation). When the ring buffer wraps, padding records align the next real record to a clean start.

Dual headers and atomic updates

Two identical 4 KiB header pages sit at the start of the file. On every commit the engine writes the new metadata to the older page, updates its generation number, then fsyncs. On open it keeps the page with the highest valid generation. A crash mid-header-write is fine: the previous good generation is still intact.

Recovery flow (simplified)

// 1. Choose the newest valid header page
let header = selectBestHeaderPage()

// 2. (Optional) fast path using a snapshot stored in the header
if let snapshot = header.replaySnapshot {
    startScanningFrom(snapshot.checkpointPosition)
} else {
    startScanningFrom(header.walCheckpointPosition)
}

// 3. Replay every pending record after the checkpoint
for record in walRingReader.scanPending() {
    apply(record)           // rebuild TOC, indexes, etc.
    if record.checksumFails() { /* tolerate in pending region */ }
}

// 4. Verify the final TOC against the footer

Corrupted records in the uncommitted region are skipped for positioning, but later valid records still apply. Committed data is never accepted if its checksum is bad.

Key limits (baked into the format)

Default WAL size: 256 MiB
Maximum TOC size: 64 MiB
Maximum single blob: 256 MiB
Strings capped at 16 MiB

Those caps keep metadata for a large agent memory inside a size range where recovery stays fast.

Most of the complexity lives in the recovery paths and the layered checksums: WAL record, header, TOC, footer. The format itself stays small and explicit.

Source: https://github.com/christopherkarani/Wax.