autoresearch-memory

Autonomous research on AI long-term memory systems, inspired by karpathy/autoresearch.

An AI agent autonomously modifies a memory system implementation (memory_system.py), runs evaluation benchmarks against 30 structured scenarios, and keeps/discards changes based on a quality score judged by an LLM.

How it works

prepare.py — FIXED: test scenarios, LLM judge, evaluation harness, metrics
memory_system.py — AGENT MODIFIES: storage, retrieval, update, consolidation
program.md — HUMAN EDITS: agent instructions (the "research org code")

Quick start

cd autoresearch-memory
uv sync
uv run prepare.py  # run baseline evaluation (uses local `claude` CLI as judge)

The metric

memory_score (0-10, higher is better):

memory_score = 0.3*relevance + 0.3*correctness + 0.25*completeness + 0.15*(10 - noise)

Running the agent

Point an AI agent (e.g. Claude Code) at program.md and let it go. It will autonomously experiment with memory_system.py, running the eval after each change and keeping improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
JOURNAL.md		JOURNAL.md
LICENSE		LICENSE
README.md		README.md
memory_system.py		memory_system.py
prepare.py		prepare.py
program.md		program.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoresearch-memory

How it works

Quick start

The metric

Running the agent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autoresearch-memory

How it works

Quick start

The metric

Running the agent

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages