feat(cockpit): chat/subagents renders inline subagent cards (real subgraph + aimock e2e)#718
Conversation
…graph card parity)
…subagent_type) Replace the flat inline _run_subagent() with a compiled parameterized StateGraph invoked by the task tool, so LangGraph nests it under a tools:<call_id> namespace and the SubagentTracker can surface a card. task() now takes subagent_type (Literal, required) so it always reaches the tool-call args the tracker registers on.
…bagents tray Inline persistent subagent cards (via <chat>) now surface each dispatch in conversation, so the active-only sidebar <chat-subagents> panel is removed. Pipeline note corrected to research/booking/itinerary.
…raph (captures subagent_type research/booking/itinerary)
… card e2e Each subagent is now ONE LLM call (no within-subagent tool loop), so its request carries a unique, stable discriminator (the role task_description) — the nested tool-loop rounds couldn't be matched by aimock's turnIndex/ hasToolResult scheme (404 no_fixture_match). Re-recorded the fixture and the c-subagents e2e now asserts the inline chat-subagent-card (3 cards, persists).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @blove's task in 2m 28s —— View job PR Review
SummarySolid follow-up to #711. The core diagnosis is correct — the old flat Correctness observations
Count assertion timeout — inline comment posted (cosmetic; safe in practice). Other notes
VerdictApprove with minor nits. The two inline suggestions are low-priority (a log line and an explicit timeout). The implementation is correct and the e2e coverage is meaningfully stronger than before. |
| @threadplane/langgraph SubagentTracker matches to surface the card.""" | ||
| subagent_type = state["subagent_type"] | ||
| task_description = state["task_description"] | ||
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type, _ITINERARY_PROMPT) |
There was a problem hiding this comment.
Silent fallback to _ITINERARY_PROMPT for an unrecognised subagent_type (e.g. an LLM hallucination) will produce a confusing itinerary response with no indication anything went wrong. A short print / log line on the else-path would make this much easier to debug in production:
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type, _ITINERARY_PROMPT) | |
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type) | |
| if system_prompt is None: | |
| print(f"[_subagent_node] unknown subagent_type={subagent_type!r}; falling back to itinerary", flush=True) | |
| system_prompt = _ITINERARY_PROMPT |
| await expect(page.locator('chat-subagent-card')).toHaveCount(3); | ||
|
|
||
| // Final summary text contains an aviation-related phrase from the captured | ||
| // continuation. Loose regex so refactors to the subagent prompts (research/ |
There was a problem hiding this comment.
toHaveCount carries Playwright's default assertion timeout (usually 5 s). Since submitAndWaitForResponse already waits for the agent to go idle, all 3 cards should be in the DOM by now, so this is safe in practice. But making the intent explicit avoids a future head-scratcher if the project timeout ever changes:
| // continuation. Loose regex so refactors to the subagent prompts (research/ | |
| await expect(page.locator('chat-subagent-card')).toHaveCount(3, { timeout: 15_000 }); |
|
Claude finished @blove's task in 3m 18s —— View job PR Review
SummarySolid follow-up. The root cause diagnosis is correct: the old flat Verification of tracker behaviourConfirmed by reading
Open nits (from previous review, not yet addressed)Two inline suggestions re-posted against the current commit — both low-priority:
One new minor observation
VerdictImplementation is correct. The e2e coverage is meaningfully stronger. Two pre-existing nits remain (inline suggestions on the current commit above). |
| @threadplane/langgraph SubagentTracker matches to surface the card.""" | ||
| subagent_type = state["subagent_type"] | ||
| task_description = state["task_description"] | ||
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type, _ITINERARY_PROMPT) |
There was a problem hiding this comment.
The silent fallback to _ITINERARY_PROMPT for an unrecognised subagent_type produces a confusing itinerary response with no indication anything went wrong. This was flagged in the previous review pass and is still unaddressed. A one-liner is enough:
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type, _ITINERARY_PROMPT) | |
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type) | |
| if system_prompt is None: | |
| print(f"[_subagent_node] unknown subagent_type={subagent_type!r}; falling back to itinerary", flush=True) | |
| system_prompt = _ITINERARY_PROMPT |
|
|
||
| // One card per subagent dispatched (research/booking/itinerary), no | ||
| // duplicates — the orchestrator calls task three times in order. | ||
| await expect(page.locator('chat-subagent-card')).toHaveCount(3); |
There was a problem hiding this comment.
toHaveCount inherits the project's default assertion timeout. submitAndWaitForResponse already waits for idle, so all 3 cards should be present — but making the intent explicit protects against a future default-timeout change and self-documents the expected wall-clock. This was flagged in the previous review pass and is still unaddressed:
| await expect(page.locator('chat-subagent-card')).toHaveCount(3); | |
| await expect(page.locator('chat-subagent-card')).toHaveCount(3, { timeout: 15_000 }); |
|
Claude finished @blove's task in 1m 54s —— View job PR Review
SummaryThis synchronize push merges recent Two outstanding nits (flagged in prior two passes, still unaddressed)
Both are one-liners. Everything else looks good. |
| @threadplane/langgraph SubagentTracker matches to surface the card.""" | ||
| subagent_type = state["subagent_type"] | ||
| task_description = state["task_description"] | ||
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type, _ITINERARY_PROMPT) |
There was a problem hiding this comment.
Still unaddressed from the previous two review passes. An unrecognised subagent_type silently produces an itinerary response — add a log line so this is observable when it happens:
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type, _ITINERARY_PROMPT) | |
| system_prompt = _SUBAGENT_PROMPTS.get(subagent_type) | |
| if system_prompt is None: | |
| print(f"[_subagent_node] unknown subagent_type={subagent_type!r}; falling back to itinerary", flush=True) | |
| system_prompt = _ITINERARY_PROMPT |
|
|
||
| // One card per subagent dispatched (research/booking/itinerary), no | ||
| // duplicates — the orchestrator calls task three times in order. | ||
| await expect(page.locator('chat-subagent-card')).toHaveCount(3); |
There was a problem hiding this comment.
Still unaddressed from the previous two review passes. toHaveCount inherits the project's default assertion timeout — submitAndWaitForResponse already waits for idle so all 3 cards should be present, but an explicit timeout self-documents intent and insulates the assertion from a future default-timeout change:
| await expect(page.locator('chat-subagent-card')).toHaveCount(3); | |
| await expect(page.locator('chat-subagent-card')).toHaveCount(3, { timeout: 15_000 }); |
Summary
Follow-up to #711 (inline persistent subagent cards). The
cockpit/chat/subagentsdemo used a flat inlinetasktool, soagent.subagents()never populated and it showed a generic "task" chip instead of a subagent card. This aligns it with the workingexamples/chatpattern so it renders the inline persistentchat-subagent-card— on both the live LangGraph runtime and under aimock e2e — and adds the now-possible card assertion toexamples/chattoo.What changed
cockpit/chat/subagents/python): thetasktool now invokes a real compiled subagent subgraph (subagent_typeis a requiredLiteral, so it reaches the tool-call args theSubagentTrackerregisters on, and the child subgraph'stools:<id>namespace matches it). Each subagent is a single deterministic LLM call (no within-subagent tool loop) — see below.subagentToolNames: ['task']onprovideAgent; removed the now-redundant active-only<chat-subagents>sidebar tray (the inline persistent cards supersede it); corrected the pipeline note to research/booking/itinerary.c-subagents.jsonagainst the new graph.cockpit/chat/subagentsandexamples/chat/research-subagentnow assert the inlinechat-subagent-card(cockpit asserts exactly 3, persisting).Why single-call subagents
The first attempt kept the per-subagent tool loop. Under aimock replay the run errored with
404 no_fixture_match: a nested subagent's tool-loop rounds present local discriminators (turnIndex/hasToolResult) that the recorder captured against the global conversation, so they don't match on replay. Collapsing each subagent to one LLM call gives every subagent request a unique, stable discriminator (its roletask_description), which aimock matches deterministically. Within-subagent tool calling is the dedicated tool-calls cap's concern; this cap demonstrates subagent orchestration + the card.Verification
cockpit-chat-subagentse2e — green (assertschat-subagent-card×3, persists).examples-chate2e — 42 passed (incl. research-subagent card assertion).Spec:
docs/superpowers/specs/2026-06-19-cockpit-subagents-subgraph-design.md· Plan:docs/superpowers/plans/2026-06-19-cockpit-subagents-subgraph.md🤖 Generated with Claude Code