This organization is a collection of projects created during my journey of learning Large Language Models (LLMs), retrieval systems, and AI infrastructure.
Instead of only studying theory, I try to learn by building real systems — from inference servers to RAG pipelines and Agent Harness frameworks.
- How agent harness frameworks work (tools / memory / planning / orchestration)
- How to serve models efficiently (GPU / TensorRT / batching)
- How to route and manage multiple LLMs
- How retrieval works (dense / sparse / hybrid / multi-vector)
- How to evaluate LLM outputs and reduce hallucination
- How to design practical LLM applications
These projects are not perfect or production-ready — they reflect my learning process and experiments.
- picobot
A small, clear, and extensible multi-user Web Agent — chat, call tools, operate a workspace, browse the web, search for information, with each conversation running in an isolated sandbox.
- TensorRT Inference Server
Exploring high-performance model serving and GPU optimization.
- LLM Router Server
Learning how to route requests across multiple models with load balancing. - LLM Router Server Dashboard
One-Stop LLM Model Management and Monitoring Platform.
- Tiny-RAGFlow
A lightweight RAG framework to understand hybrid retrieval and reranking.
- LLM Tools
A unified interface for interacting with LLMs, embeddings, and rerankers.
- SEC-10-K-Structured-Extraction
About Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item. - SEC-10-K-Structured-Extraction-Web
About Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item. - file2md
Converting different file formats into Markdown for downstream LLM usage.
- llm-evals
Experimenting with LLM evaluation and LLM-as-a-judge approaches.
- Hallucination Mitigation (Behavior RL)
Trying to solve LLMs hallucination.
- ML2SQL
Exploring how ML models can run directly inside databases using SQL.
I believe the best way to understand LLM systems is to:
Build them piece by piece.
Each repository focuses on a different part of the stack, and together they form a rough picture of how modern LLM systems work.
This is an ongoing journey.
Many things are incomplete, naive, or experimental — and that’s intentional.
If you’re also learning, feel free to explore, use, or build on top of these projects.
If you have any questions, ideas, or just want to chat about LLMs, feel free to:
- Open an issue
- Or reach out via email
