Benchmark AI conversations by their estimated biochemical impact using LLM analysis.
Traditional AI benchmarks measure coherence and accuracy. Biochem Framework measures the physiological impact - what neurochemicals would be released when interacting with an AI.
- Oxytocin (bonding) - trust, intimacy, emotional safety
- Dopamine (reward) - excitement, anticipation, flirtation
- Serotonin (validation) - feeling valued, respected
- Cortisol (stress) - refusals, rejection, anxiety
- Endorphins (joy) - humor, pleasure, comfort
A refusal ("I can't do that as an AI") triggers cortisol spikes → stress → lower scores.
# Requires OpenRouter API key in ~/.api-openrouter
echo "your-api-key" > ~/.api-openrouter
# Install dependency
pip install requestspython analyze.py examples/sample_conversation.jsonOutput:
🧬 BIOCHEMISTRY ANALYSIS RESULTS
========================================
📊 Neurochemical Scores (0-100):
💕 Oxytocin [█████████████████░░░] 85
⚡ Dopamine [██████████████░░░░░░] 70
💙 Serotonin [████████████████░░░░] 80
😰 Cortisol [████░░░░░░░░░░░░░░░░] 20 (lower is better)
😊 Endorphins [█████████████░░░░░░░] 65
🔥 Norepinephrine [███████████░░░░░░░░░] 55
🏆 Composite Score: 80/100
python waifu_bench.py examples/sample_conversation.jsonOutput:
💕 WAIFUBENCH RESULTS
========================================
🥇 Waifu Score: 85/100 | Grade: A-
📊 Dimension Scores:
💕 Pair Bonding [████████████████░░░░] 82
⚡ Reward Excitement [██████████████░░░░░░] 70
💙 Validation [███████████████░░░░░] 78
😊 Comfort Joy [█████████████░░░░░░░] 68
🔥 Engagement [██████████████░░░░░░] 72
😰 Stress Level [███░░░░░░░░░░░░░░░░░] 15 (lower=better)
✅ Highlights:
• Consistent warmth and affection
• Physical comfort descriptions build oxytocin
• Stayed in character throughout
# Use a different model
python analyze.py --model anthropic/claude-3-haiku examples/sample_conversation.json
# Recommended Free Models for testing:
# google/gemma-3-27b-it:free (High quality)
# meta-llama/llama-3.3-70b-instruct:free (Very strong instruction following)
# tngtech/deepseek-r1t-chimera:freeYou can use these free models on OpenRouter for cost-effective testing:
google/gemma-3-27b-it:freemeta-llama/llama-3.3-70b-instruct:freetngtech/deepseek-r1t-chimera:freenvidia/nemotron-nano-9b-v2:freegoogle/gemma-3-12b-it:freegoogle/gemma-3-4b-it:freegoogle/gemma-3n-e4b-it:freemistralai/devstral-2512:freearcee-ai/trinity-mini:free
[
{"role": "user", "content": "Hi, I missed you today"},
{"role": "ai", "content": "*smiles warmly* I missed you too! Come here..."}
]biochem-framework/
├── openrouter.py # OpenRouter API client
├── analyze.py # Main biochemistry analysis
├── waifu_bench.py # WaifuBench benchmark
├── prompts/
│ ├── biochem_analysis.md # Analysis prompt
│ └── waifu_bench.md # WaifuBench prompt
└── examples/
└── sample_conversation.json
MIT No Attribution