memorize: The Local-First Memory Layer for AI Agent Context

The modern developer workflow is increasingly fragmented across a constellation of AI agents. One hour is spent refining a backend schema with Claude on a MacBook, the next is spent debugging a deployment script via a CLI tool on a Linux workstation, and the evening is spent polishing the UI with a different LLM entirely. Despite the power of these models, they suffer from a fundamental flaw: they are goldfish. Every time a developer switches devices, models, or even sessions, the critical architectural context—the why behind a specific variable name or the reason a certain library was rejected—evaporates. This context drift forces developers to spend a significant portion of their cognitive load re-explaining the project state to their AI assistants.

The Architecture of Persistent Project Memory

To bridge this gap, a new local-first open-source tool called memorize has been released. Unlike traditional session histories that are locked into a specific provider's database, memorize treats project memory as a shared resource that exists independently of the model. Released under the AGPL license, the tool ensures that all data remains on the local machine, with a strict policy against collecting telemetry. This design appeals to developers who require high privacy and zero-latency access to their project's intellectual property.

Integration is designed to be frictionless. The tool can be installed using a single command:

bash

curl -fsSL https://raw.githubusercontent.com/shakystar/memorize/… | sh

For those already operating within Claude or Codex sessions, the setup can be triggered by instructing the agent: "Set up memorize in this project. Follow https://github.com/shakystar/memorize/…". This allows the agent to configure the environment itself, turning the project folder into a living repository of memory. For those interested in the technical blueprint, the full design is detailed in the official ARCHITECTURE.md.

At its core, memorize implements a two-layer memory system inspired by the brain's Complementary Learning Systems (CLS). The first layer acts as the hippocampus, capturing raw observation data during active work sessions. This process is designed to be lightweight and cheap, avoiding expensive LLM calls during the primary execution flow. The second layer, acting as the neocortex, is a background process that triggers at session boundaries to integrate and synthesize these raw observations into a coherent long-term memory structure.

Beyond Vector Search: Deterministic Sync and Logic

Most AI memory solutions rely on simple Retrieval-Augmented Generation (RAG) using vector databases, but memorize introduces a more rigorous approach to state synchronization. The system does not physically delete data to handle forgetting. Instead, it employs a scoring mechanism to determine which memories are retrieved. The priority is calculated using a specific formula: Importance $\times$ 14-day half-life freshness $\times$ Task relevance. This ensures that the most pertinent and recent information surfaces first, while the full history remains preserved for deep audits.

To prevent data loss during the synchronization of these memories across multiple machines, memorize utilizes a watermark mechanism. A watermark only advances once an event has been successfully persisted. If an LLM timeout or a parsing error occurs, the system simply retries the same segment at the next boundary. Even if the watermark itself is lost, the system prevents duplication through source-based deduplication.

Infrastructure-wise, memorize avoids the need for a central server by synchronizing an append-only event log. This allows multiple machines to converge on the same state using deterministic rules, removing the need for complex clock synchronization. One of the most critical technical pivots in memorize is how it handles contradictions. While many tools rely on cosine similarity to find related memories, memorize recognizes that vectors are often blind to negation. For example, the phrases "merge this change" and "do not merge this change" are mathematically similar in vector space but diametrically opposed in meaning. To solve this, memorize uses embedding vectors only to retrieve candidates, while the final determination of whether a memory is contradictory or complementary is performed by the LLM itself.

Implementing memorize into a production workflow removes the typical overhead of API key management. The tool does not require its own separate LLM API keys; instead, it hooks into existing authenticated environments such as `claude -p` or `codex exec`. To prevent the dreaded infinite recursion loop—where an agent's act of organizing its memory triggers another memory event—the system employs environment variable guards to silence hooks during the integration phase.

From an operational standpoint, this enables real-time collaboration across parallel sessions. Multiple agents working on the same project can monitor each other's progress, triggering conflict warnings if two sessions attempt to modify the same file simultaneously. The memory schema is not static; it evolves based on empirical telemetry, specifically tracking observation lifespan and injection/invalidation points over several weeks of usage.

While the local-first approach eliminates server costs, the tool is currently in a stage of active community expansion. Developers integrating it with tools like Cursor, Gemini CLI, or Windsurf will find that adapter extensions are still being developed. Additionally, early adopters should be aware of environment-specific bugs, such as quote-handling issues in PowerShell 5.1 or EACCES permission errors on certain Linux distributions.

The shift toward deterministic, local-first memory marks a transition from AI as a stateless chat interface to AI as a persistent project collaborator.

memorize: The Local-First Memory Layer for AI Agent Context

The Architecture of Persistent Project Memory

Beyond Vector Search: Deterministic Sync and Logic

Related Articles