Lis uses rolling async context compaction to manage long conversations. When context grows too large, older messages are summarized into a compact block while the conversation continues uninterrupted. Each compaction creates a new session, preserving the summary and an embedding for future semantic search.
Each session is a self-contained segment of a chat with its own summary. Full compaction always creates a new session. Summaries are never merged into each other.
- Compaction → finalize current session (summary + embedding) → reassign kept messages
to new session → set
chat.CurrentSessionIdto new session. /newor/clear→ finalize session (async summary) → create new session. Messages stay with their original session./resume→ reopen a previous session (full context) or create a new session with the target's summary injected (fallback).- Context for continuations: the parent session's summary is injected when building context, giving the AI continuity without merging.
Each message has a NOT NULL session_id FK to session. Messages are loaded via
WHERE m.SessionId == session.Id.
During compaction, messages after the split point are atomically reassigned to the new
session via UPDATE message SET session_id = newSession.Id WHERE session_id = oldSession.Id AND id > splitMessageId.
Token counts come exclusively from provider response metadata (input_tokens,
output_tokens, cache_read_input_tokens, cache_creation_input_tokens).
The old EstimateTokens heuristic (chars / 3.5) is removed entirely.
- Compaction trigger: based on actual
input_tokensfrom the last response. - Pre-send check: for very large contexts,
CountMessageTokensAsync(Anthropic's free endpoint) can validate before sending. - Per-message storage: actual token counts stored as columns on
MessageEntity, not buried inSkContentJSON.
Two stages, applied in order:
-
Tool output pruning — triggered when total tool result tokens (after
ToolsPrunedThroughId) exceedToolPruneThreshold. Replaces tool output with the bare function name (preservingFunctionResultContentmetadata andCallId). Non-destructive (DB unchanged). Runs ONCE, stays stable → preserves prompt cache. -
Full compaction — triggered when
input_tokensfrom the last response exceedsCompactionThreshold(default: 80% ofContextBudget). KeepsKeepRecentTokensof recent messages. Everything before that is summarized by the compaction LLM. Creates a new session with kept messages reassigned atomically.
Global env LIS_TOOL_SUMMARIZATION_POLICY:
auto— per-tool[ToolSummarization]attribute. Default: Prune.keep_all— never prune. Safeguard: force-prunes if context blows past threshold.keep_none— always prune everything.
Per-tool attribute SummarizationPolicy:
Prune— replace with one-liner.Summarize— include full output in compaction prompt for LLM summarization (e.g., book content with emotional value).
Anthropic's prompt caching (up to 4 cache_control breakpoints):
- After system prompt — stable, rarely changes.
- After session summary (if exists) — changes only on compaction.
- At tool prune boundary — ensures pruning doesn't invalidate later messages' cache.
- Top-level automatic —
cache_controlon request body, auto-caches growing conversation.
Implemented via CacheControlHandler DelegatingHandler that injects cache_control
markers into the HTTP request JSON.
Compaction can use a different provider/model than the main conversation (e.g., Haiku
for cheap summaries). Registered as keyed IChatClient("compaction"). Falls back to
main client if not configured.
/status, /new, /clear, /compact, /prune, /resume, /abort are handled by CommandRouter
before AI processing. Commands bypass debouncing and execute immediately.
Responses are persisted to message history. No AI tokens wasted on commands.
Commands support arguments: /command [args].
All message queries filter !m.Queued to exclude messages that are queued while an AI
response is in progress. See docs/MESSAGE_QUEUE.md for the full queuing model.
ANTHROPIC_THINKING_EFFORT env var: low (1024), medium (4096), high (16384),
or exact token count.
| Env Var | Default | Purpose |
|---|---|---|
LIS_KEEP_RECENT_TOKENS |
4000 | Recent messages kept verbatim after compaction |
LIS_TOOL_PRUNE_THRESHOLD |
8000 | Tool output tokens to trigger pruning |
LIS_TOOL_KEEP_THRESHOLD |
2000 | Recent tool output tokens to keep unpruned |
LIS_COMPACTION_THRESHOLD |
80% of budget | Input tokens to trigger full compaction |
LIS_COMPACTION_NOTIFY |
true | Notify user on compaction events |
LIS_TOOL_SUMMARIZATION_POLICY |
auto | auto, keep_all, keep_none |
LIS_RESUME_TOKEN_BUDGET |
70% of budget | Max tokens for full session resume |
LIS_COMPACTION_PROVIDER |
(main) | anthropic (others: not yet) |
LIS_COMPACTION_API_KEY |
API key for compaction provider | |
LIS_COMPACTION_MODEL |
Model for summarization | |
ANTHROPIC_CACHE_TTL |
5m | 5m or 1h |
ANTHROPIC_CACHE_ENABLED |
true | Toggle prompt caching |
ANTHROPIC_THINKING_EFFORT |
(off) | Thinking budget |
Tool pruning:
🔧 Tool outputs pruned (12.4k → 2.1k, -83%)
📊 Context: 18.5k/150k (12%)
Full compaction:
⚙️ Compacted (157k → 9.1k)
🔧 System: 1.8k tokens
📝 Summary: 3.2k tokens
💬 Kept context: 2.7k tokens
🛠️ Tools: 1.4k tokens (1.1k defs + 300 calls)
📊 Total: 9.1k/150k (6%)
| File | Purpose |
|---|---|
Lis.Persistence/Entities/SessionEntity.cs |
Session data model with summary, embedding, token stats |
Lis.Persistence/Entities/MessageEntity.cs |
Message data model with session_id FK |
Lis.Agent/CompactionService.cs |
Async summarization, session lifecycle, embedding generation, message reassignment |
Lis.Agent/ContextWindowBuilder.cs |
History assembly with session/summary injection, tool pruning |
Lis.Agent/ConversationService.cs |
Session management, compaction triggers, command routing |
Lis.Agent/ToolRunner.cs |
Token usage extraction from streaming responses |
Lis.Agent/Commands/ |
Command framework: IChatCommand, CommandRouter, /status, /new, /compact, /prune, /resume |
Lis.Providers/Anthropic/AnthropicProvider.cs |
CacheControlHandler, thinking effort, cache config |
Lis.Core/Util/ToolSummarizationAttribute.cs |
Per-tool summarization policy attribute |
Lis.Core/Channel/TokenUsage.cs |
Token usage DTO |
- Message arrives →
ConversationService.IngestMessageAsync(ensure session, persist withsession_id) ConversationService.RespondAsync— load chat + session- Check commands (
/status,/new,/clear,/compact,/prune,/resume) → handle without AI - Load messages from current session (
WHERE m.SessionId == session.Id) - Build context: system prompt → parent summary → session summary → messages (with pruning)
- Send to AI via ToolRunner (streaming, with tool loop)
- Extract
TokenUsagefrom response metadata → update session stats + message columns - Check compaction triggers (based on actual
input_tokens):- Tool prune threshold → set
ToolsPrunedThroughId - Compaction threshold → fire async
CompactionService.CompactAsync
- Tool prune threshold → set
- Compaction: summarize → embed → finalize session → create new session → reassign kept messages (in transaction)
CacheControlHandler DelegatingHandler intercepts Anthropic API requests and injects
4 cache breakpoints:
- Last system content block — caches the stable system prompt
- Last session summary message — caches summaries (stable within session)
- Tool prune boundary — caches pruned content (stable once set, communicated via
ToolContext.CacheBreakIndex) - Top-level
cache_control— auto-caches growing conversation prefix
The handler is inserted in the HttpClient pipeline before BearerAuthHandler.
Cache stats (cache_read_input_tokens, cache_creation_input_tokens) are extracted
from response metadata and stored per-message and per-session.