Extract and Evolve: How Claude Code and Complementary RL Architect Agent Memory

<aside> 🚀

When we took a closer look at the architecture of Claude Code, we experienced a profound sense of serendipity: its core design philosophies are remarkably similar to our Complementary RL framework. From asynchronous memory extraction (extractMemory) to cross-session refinement (autoDream) and on-demand retrieval, the parallels are striking. In this post, we use Claude Code's memory system as a lens to explore the step-by-step development and the unwritten chain-of-thought behind Complementary RL.

</aside>

Extract and Evolve Leading Figure.png

<aside> 💡

TL;DR

A closer look at Claude Code reveals a striking alignment with our Complementary RL framework: true agent memory must be subtractive, actively evolving, and ultimately internalized. Here are the core findings from our journey to build this architecture:

Subtractive Memory is the Key: Like Claude Code’s extractMemory and autoDream, we found that raw context is just noise. To be useful, memory must be actively distilled into generalizable rules.Phase 1: Extracting and Evolving (The Rollout Process)
Mixture Exploration & Separate Advantage Estimation: Baking explicit memory into implicit model weights often destabilizes standard RL loops. By utilizing a mixture of memory-guided and memory-free exploration, paired with Separate Advantage Estimation for each sub-group, we stabilized the gradients and successfully turned text-based rules into model instincts. Phase 2: Encoding Memory into Parameters (The RL Loop)
Co-Evolution Unlocks Simplicity: Tightly coupling the actor and extractor creates a naturally self-correcting loop. In our setup, this alignment naturally bypassed the need for offline memory distillation, complex retrieval pipelines, or strict positive/negative trajectory separation. Phase 3: Co-Evolution (The Coach and The Athlete) </aside>

Content

Additive vs. Subtractive Memory Design

As LLMs take on increasingly complex roles, transitioning from reactive chatbots to autonomous agents, our expectations of them scale accordingly. We are moving beyond simple, context-level memorization; we now expect agents to construct their own persistent memory systems. They need to retain critical insights, avoid repeating past mistakes, and evolve.

**The Instinct to Add: ****To fulfill this demand for active, complex reasoning, the early design phase of agent memory was almost additive. We tried to force agents to remember everything by engineering highly complex structures, intricate knowledge graphs, massive tabular databases, and dense vector stores paired with convoluted hybrid retrieval mechanisms.

The Subtractive Shift: However, as underlying models have grown more capable, a clear counter-trend has emerged in cutting-edge tools like Claude Code. The prevailing design principle is becoming subtractive. Instead of over-engineering the storage, these systems strip memory down to a raw, hierarchical file system. We simply provide the model with an index of what it knows, equip it with the right tools, and trust the agent to actively read the files only when it deems it necessary.

<aside> 💡

This shift is undeniably driven by improved LLM capabilities, like advanced tool-calling. But it is also driven by a harder truth: unfiltered context pollution degrades reasoning. The ideal memory design pattern is not a rigid database schema, but an evolving, model-driven workspace.

</aside>

With this paradigm shift in mind, this post will explore:

The Anatomy of Subtraction: How Claude Code’s memory mechanism actually works in practice.
The Hidden Chain-of-Thought: How this exact subtractive philosophy inspired the design logic behind our Complementary RL framework, and the similarities between our Complementary RL and Claude Code.

The Anatomy of Subtraction: Claude Code's Memory Mechanism

Instead of relying on sophisticated databases, vector stores, or complex data structures, Claude Code manages its entire memory system using raw Markdown files. Embracing this radically subtractive design, Claude operates on a streamlined, four-tiered memory lifecycle:

CLAUDE.md: These are explicit, user-defined rules, conventions, and architectural decisions that are loaded directly into the system prompt at the start of every session.
autoDream: Operating on a periodic time-window or session interval, autoDream actively reads existing memories, merges redundancies, and prunes outdated information. This ensures the memory bank continuously evolves rather than simply bloating over time.