fix(react): resolve response_format conflict with native tool calling#175
fix(react): resolve response_format conflict with native tool calling#175qinxuye wants to merge 12 commits intoxorbitsai:mainfrom
Conversation
982a9bd to
e354056
Compare
rogercloud
left a comment
There was a problem hiding this comment.
Code Review Summary
Issues Found: 5
- 1 CRITICAL: String
type="tool_call"falls through to text handling - 3 MINOR: Code edge cases and refactoring opportunities
- 1 INFO: Breaking changes documentation
Overall Assessment
This PR fixes an important bug (Gemini empty responses with tools) by enabling native tool calling. The core fix is sound, but there's one critical edge case that should be addressed.
Detailed Issues
1. CRITICAL - Missing type="tool_call" handling in string response parsing
File: src/xagent/core/agent/pattern/react.py
When a string response contains valid JSON with type="tool_call", it incorrectly falls through to "direct text response" handling instead of being rejected.
Current code (line 1683):
if isinstance(parsed, dict) and parsed.get("type") == "final_answer":
# handles final_answer
return action
except (json.JSONDecodeError, AttributeError):
pass # Only catches JSON errors
# BUG: If parsed has type="tool_call", falls through here!Suggested fix:
if isinstance(parsed, dict):
action_type = parsed.get("type")
if action_type == "final_answer":
# ... handle final_answer
elif action_type == "tool_call":
# Should not happen with native tool calling - reject
raise PatternExecutionError(...)2. MINOR - Potential IndexError on empty accumulated_tool_calls
File: src/xagent/core/model/chat/basic/claude.py
Line 887: tool_id = list(accumulated_tool_calls.keys())[0]
If we reach this branch because accumulated_tool_calls is empty (falsy), this raises IndexError. Add a guard check.
3. MINOR - anyOf/oneOf code duplication
File: src/xagent/core/model/chat/basic/claude.py (lines 55-139)
The anyOf and oneOf handling blocks contain ~80 lines of nearly identical logic. Consider extracting to a helper function like _handle_union_type(schema, key).
4. MINOR - Empty string tool_id creates dict entry
File: src/xagent/core/model/chat/basic/openai.py (line 966-977)
The comment says this handles "qwen's empty string id format", which appears intentional. However, if association by index fails, a tool call entry with "" as the key is created. Consider skipping such chunks instead.
5. INFO - Breaking changes in vision_tool.py
File: src/xagent/core/tools/adapters/vibe/vision_tool.py
Function signatures changed:
imageparameter removed (renamed toimages)promptparameter removed (renamed toquestion)- Both are now required (no default values)
These are breaking changes for existing code. Consider adding a CHANGELOG entry or backward compatibility shim.
Test Results
All 33 tests pass. The test expectations were updated to match the new behavior (invalid JSON treated as text response rather than triggering retry).
Recommendation
Address Issue #1 before merge. Issues #2-#5 are edge cases/documentation and can be addressed in follow-ups if needed.
rogercloud
left a comment
There was a problem hiding this comment.
Re-Review: Additional Issues Found
Reviewed the diff beyond the previously reported issues. Found 3 new minor issues.
New Issues
- MINOR: Repeated instructional text inflates context (observation_content)
- MINOR: Missing required
reasoningfield causes unnecessary retries - MINOR: Stale/misleading comment block
None are blocking, but addressing them improves code health.
- Remove response_format when tools are available to enable native tool calls - Fix vision tool parameter names (use 'images' and 'question' instead of 'image' and 'prompt') - Add file information propagation in DAG step execution - Add tool parameter JSON schema display in plan and step stages - Add tests for file info propagation This resolves the issue where Gemini returned empty responses when tools were present due to response_format conflicting with the tools parameter.
- Add debug logging for better tracing in ReAct pattern - Simplify action requirements format when tools are available - Improve anyOf/oneOf handling for Claude tool schemas - Handle Union[str, List[str]] types properly - General code formatting improvements from ruff
- Add stream_chat support to MockReActLLM and MockLLM in tests - Fix string response parsing to prioritize JSON extraction - Properly handle final_answer with success/error fields - Update test expectations for invalid/truncated JSON in native mode - Add create_mock_stream_chat helper for DAG tests All 30 tests now pass with native tool calling enabled.
- Extract reasoning from full_content before tool calls - Pass reasoning through response object to action converter - Use actual LLM reasoning instead of hardcoded placeholder - Update prompt to only require type and reasoning in JSON - Add reasoning field to action execution traces - Ensure reasoning is available in final answer results This fixes the issue where LLM reasoning was lost when using native function calling, as the OpenAI API format doesn't include reasoning in tool_calls. Now reasoning is captured from the text content LLM generates before invoking tools.
This change separates the action type decision from tool invocation: 1. First LLM call: Determine action type (tool_call vs final_answer) - Returns JSON with action type and reasoning - No tool schemas passed 2. Second LLM call (if tool_call): Invoke specific tool via native calling - Pass tool schemas to LLM - LLM uses native function calling API - Returns tool_calls with tool_name and arguments Benefits: - Clearer separation of concerns - Better reasoning capture (not lost in native tool call format) - More predictable LLM behavior - Easier to debug and trace Tests updated to reflect 3-call pattern: - Call 1: Action type - Call 2: Native tool invocation - Call 3: Final answer
- Fix: Clean messages_with_prompt instead of messages in second call - Simplify first call prompt to focus on action type decision only - Remove confusing instructions about tool calling in first call - Clarify that system will guide tool invocation in follow-up call
Remove verbose debug logging from the first and second LLM calls while preserving important warnings and errors. The changes include: - Remove tool availability and name logs - Remove response type/value debug logs - Remove json_repair success logs - Remove token chunk logs - Remove second call info logs Keep critical logs for validation failures, unknown types, and errors.
…empty Add guard check before accessing accumulated_tool_calls.keys()[0] when event.index is not available. If accumulated_tool_calls is empty, skip the chunk instead of raising IndexError.
…tibility - Only use two-phase tool calling when step_id == "main" (top-level ReAct) - Skip second call for DAG steps to maintain backward compatibility - This fixes DAG tests that expect single-phase tool calling behavior - Also fix potential IndexError in claude.py when accumulated_tool_calls is empty
In DAG mode (step_id != "main"), use traditional single-phase tool calling where tools are passed to the LLM from the start. This maintains backward compatibility with DAG tests and avoids infinite retry loops. In main ReAct mode (step_id == "main"), use two-phase tool calling: - First call: JSON format to decide action type - Second call: Native tool calling to invoke the tool This fixes the issue where DAG steps were executing tools in a loop.
- refactor(claude): extract duplicated anyOf/oneOf handling into _handle_union_type() helper (~85 lines reduced) - fix(openai): skip unassociable tool call chunks instead of creating entries with empty string keys - fix(react): shorten observation instructional text while keeping behavioral nudge to prevent tool loops - fix(react): add missing required 'reasoning' field in _normalize_action_data() fallback dicts to prevent ValidationError - fix(react): remove stale comment about missing task_id/step_id access - fix(test): prevent infinite tool call loop in DAG sequential test by tracking executed tools in mock LLM
This resolves the issue where Gemini returned empty responses when tools were present due to response_format conflicting with the tools parameter.