Skip to content

fix(react): resolve response_format conflict with native tool calling#175

Open
qinxuye wants to merge 12 commits intoxorbitsai:mainfrom
qinxuye:enh/tool_call
Open

fix(react): resolve response_format conflict with native tool calling#175
qinxuye wants to merge 12 commits intoxorbitsai:mainfrom
qinxuye:enh/tool_call

Conversation

@qinxuye
Copy link
Copy Markdown
Contributor

@qinxuye qinxuye commented Mar 17, 2026

  • Remove response_format when tools are available to enable native tool calls
  • Fix vision tool parameter names (use 'images' and 'question' instead of 'image' and 'prompt')
  • Add file information propagation in DAG step execution
  • Add tool parameter JSON schema display in plan and step stages
  • Add tests for file info propagation

This resolves the issue where Gemini returned empty responses when tools were present due to response_format conflicting with the tools parameter.

@XprobeBot XprobeBot added the bug Something isn't working label Mar 17, 2026
@qinxuye qinxuye marked this pull request as ready for review March 27, 2026 17:09
@qinxuye qinxuye force-pushed the enh/tool_call branch 2 times, most recently from 982a9bd to e354056 Compare March 27, 2026 18:08
Copy link
Copy Markdown
Collaborator

@rogercloud rogercloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Issues Found: 5

  • 1 CRITICAL: String type="tool_call" falls through to text handling
  • 3 MINOR: Code edge cases and refactoring opportunities
  • 1 INFO: Breaking changes documentation

Overall Assessment

This PR fixes an important bug (Gemini empty responses with tools) by enabling native tool calling. The core fix is sound, but there's one critical edge case that should be addressed.


Detailed Issues

1. CRITICAL - Missing type="tool_call" handling in string response parsing

File: src/xagent/core/agent/pattern/react.py

When a string response contains valid JSON with type="tool_call", it incorrectly falls through to "direct text response" handling instead of being rejected.

Current code (line 1683):

if isinstance(parsed, dict) and parsed.get("type") == "final_answer":
    # handles final_answer
    return action
except (json.JSONDecodeError, AttributeError):
    pass  # Only catches JSON errors

# BUG: If parsed has type="tool_call", falls through here!

Suggested fix:

if isinstance(parsed, dict):
    action_type = parsed.get("type")
    if action_type == "final_answer":
        # ... handle final_answer
    elif action_type == "tool_call":
        # Should not happen with native tool calling - reject
        raise PatternExecutionError(...)

2. MINOR - Potential IndexError on empty accumulated_tool_calls

File: src/xagent/core/model/chat/basic/claude.py

Line 887: tool_id = list(accumulated_tool_calls.keys())[0]

If we reach this branch because accumulated_tool_calls is empty (falsy), this raises IndexError. Add a guard check.


3. MINOR - anyOf/oneOf code duplication

File: src/xagent/core/model/chat/basic/claude.py (lines 55-139)

The anyOf and oneOf handling blocks contain ~80 lines of nearly identical logic. Consider extracting to a helper function like _handle_union_type(schema, key).


4. MINOR - Empty string tool_id creates dict entry

File: src/xagent/core/model/chat/basic/openai.py (line 966-977)

The comment says this handles "qwen's empty string id format", which appears intentional. However, if association by index fails, a tool call entry with "" as the key is created. Consider skipping such chunks instead.


5. INFO - Breaking changes in vision_tool.py

File: src/xagent/core/tools/adapters/vibe/vision_tool.py

Function signatures changed:

  • image parameter removed (renamed to images)
  • prompt parameter removed (renamed to question)
  • Both are now required (no default values)

These are breaking changes for existing code. Consider adding a CHANGELOG entry or backward compatibility shim.


Test Results

All 33 tests pass. The test expectations were updated to match the new behavior (invalid JSON treated as text response rather than triggering retry).

Recommendation

Address Issue #1 before merge. Issues #2-#5 are edge cases/documentation and can be addressed in follow-ups if needed.

Copy link
Copy Markdown
Collaborator

@rogercloud rogercloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Additional Issues Found

Reviewed the diff beyond the previously reported issues. Found 3 new minor issues.

New Issues

  • MINOR: Repeated instructional text inflates context (observation_content)
  • MINOR: Missing required reasoning field causes unnecessary retries
  • MINOR: Stale/misleading comment block

None are blocking, but addressing them improves code health.

qinxuye and others added 12 commits March 29, 2026 23:44
- Remove response_format when tools are available to enable native tool calls
- Fix vision tool parameter names (use 'images' and 'question' instead of 'image' and 'prompt')
- Add file information propagation in DAG step execution
- Add tool parameter JSON schema display in plan and step stages
- Add tests for file info propagation

This resolves the issue where Gemini returned empty responses when
tools were present due to response_format conflicting with the tools
parameter.
- Add debug logging for better tracing in ReAct pattern
- Simplify action requirements format when tools are available
- Improve anyOf/oneOf handling for Claude tool schemas
- Handle Union[str, List[str]] types properly
- General code formatting improvements from ruff
- Add stream_chat support to MockReActLLM and MockLLM in tests
- Fix string response parsing to prioritize JSON extraction
- Properly handle final_answer with success/error fields
- Update test expectations for invalid/truncated JSON in native mode
- Add create_mock_stream_chat helper for DAG tests

All 30 tests now pass with native tool calling enabled.
- Extract reasoning from full_content before tool calls
- Pass reasoning through response object to action converter
- Use actual LLM reasoning instead of hardcoded placeholder
- Update prompt to only require type and reasoning in JSON
- Add reasoning field to action execution traces
- Ensure reasoning is available in final answer results

This fixes the issue where LLM reasoning was lost when using
native function calling, as the OpenAI API format doesn't include
reasoning in tool_calls. Now reasoning is captured from the text
content LLM generates before invoking tools.
This change separates the action type decision from tool invocation:

1. First LLM call: Determine action type (tool_call vs final_answer)
   - Returns JSON with action type and reasoning
   - No tool schemas passed

2. Second LLM call (if tool_call): Invoke specific tool via native calling
   - Pass tool schemas to LLM
   - LLM uses native function calling API
   - Returns tool_calls with tool_name and arguments

Benefits:
- Clearer separation of concerns
- Better reasoning capture (not lost in native tool call format)
- More predictable LLM behavior
- Easier to debug and trace

Tests updated to reflect 3-call pattern:
- Call 1: Action type
- Call 2: Native tool invocation
- Call 3: Final answer
- Fix: Clean messages_with_prompt instead of messages in second call
- Simplify first call prompt to focus on action type decision only
- Remove confusing instructions about tool calling in first call
- Clarify that system will guide tool invocation in follow-up call
Remove verbose debug logging from the first and second LLM calls while
preserving important warnings and errors. The changes include:
- Remove tool availability and name logs
- Remove response type/value debug logs
- Remove json_repair success logs
- Remove token chunk logs
- Remove second call info logs

Keep critical logs for validation failures, unknown types, and errors.
…empty

Add guard check before accessing accumulated_tool_calls.keys()[0] when
event.index is not available. If accumulated_tool_calls is empty, skip
the chunk instead of raising IndexError.
…tibility

- Only use two-phase tool calling when step_id == "main" (top-level ReAct)
- Skip second call for DAG steps to maintain backward compatibility
- This fixes DAG tests that expect single-phase tool calling behavior
- Also fix potential IndexError in claude.py when accumulated_tool_calls is empty
In DAG mode (step_id != "main"), use traditional single-phase tool calling
where tools are passed to the LLM from the start. This maintains backward
compatibility with DAG tests and avoids infinite retry loops.

In main ReAct mode (step_id == "main"), use two-phase tool calling:
- First call: JSON format to decide action type
- Second call: Native tool calling to invoke the tool

This fixes the issue where DAG steps were executing tools in a loop.
- refactor(claude): extract duplicated anyOf/oneOf handling into
  _handle_union_type() helper (~85 lines reduced)
- fix(openai): skip unassociable tool call chunks instead of creating
  entries with empty string keys
- fix(react): shorten observation instructional text while keeping
  behavioral nudge to prevent tool loops
- fix(react): add missing required 'reasoning' field in
  _normalize_action_data() fallback dicts to prevent ValidationError
- fix(react): remove stale comment about missing task_id/step_id access
- fix(test): prevent infinite tool call loop in DAG sequential test
  by tracking executed tools in mock LLM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants