Skip to content

MisreadableMind/autoresearch-memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

autoresearch-memory

Autonomous research on AI long-term memory systems, inspired by karpathy/autoresearch.

An AI agent autonomously modifies a memory system implementation (memory_system.py), runs evaluation benchmarks against 30 structured scenarios, and keeps/discards changes based on a quality score judged by an LLM.

How it works

  • prepare.pyFIXED: test scenarios, LLM judge, evaluation harness, metrics
  • memory_system.pyAGENT MODIFIES: storage, retrieval, update, consolidation
  • program.mdHUMAN EDITS: agent instructions (the "research org code")

Quick start

cd autoresearch-memory
uv sync
uv run prepare.py  # run baseline evaluation (uses local `claude` CLI as judge)

The metric

memory_score (0-10, higher is better):

memory_score = 0.3*relevance + 0.3*correctness + 0.25*completeness + 0.15*(10 - noise)

Running the agent

Point an AI agent (e.g. Claude Code) at program.md and let it go. It will autonomously experiment with memory_system.py, running the eval after each change and keeping improvements.

About

Memory AutoResearch Agent that tests different hypothesis to create better memory inspired by @karpathy

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages