Skip to content
#

qwen3

Here are 32 public repositories matching this topic...

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

  • Updated May 14, 2025
  • Python

Smart proxy for LLM APIs that enables model-specific parameter control, automatic mode switching (like Qwen3's /think and /no_think), and <think> tag filtering. Perfect for using advanced models with apps that lack parameter customization.

  • Updated May 19, 2025
  • Python

Improve this page

Add a description, image, and links to the qwen3 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen3 topic, visit your repo's landing page and select "manage topics."

Learn more