Log Classification System

A hybrid ML pipeline that automatically classifies server/application log messages using a 3-tier fallback strategy: Regex → BERT → LLM. Served via a FastAPI backend with a lightweight web UI.

Architecture

Incoming Log (source, message)
         │
         ▼
  ┌─────────────────────────────┐
  │  Is source "rare"?          │
  │  (seen < N times in train)  │
  └────────┬──────┬─────────────┘
           │ YES  │ NO
           ▼      ▼
         LLM    Regex match?
        (Groq)  ─────┬──────
                 YES  │  NO
                      ▼
                    BERT classifier
                 (all-MiniLM-L6-v2
                  + sklearn model)

Classification Labels

Label	Description
User Action	Login, logout, account creation
System Notification	Backups, uploads, reboots, updates
HTTP Error	4xx/5xx API responses
Security Alert	Blocked IPs, login failures, escalation
Workflow Error	Process failures, escalation errors
Deprecation Warning	Module retirement notices

Project Structure

Log-Classification-System/
├── main.py                  # FastAPI app (all API endpoints)
├── classify.py              # Routing logic (which processor to use)
├── processor_regex.py       # Tier 1: Pattern matching
├── processor_bert.py        # Tier 2: Sentence transformer + sklearn
├── processor_llm.py         # Tier 3: Groq LLM (rare sources)
├── static/
│   └── index.html           # Web UI
├── models/
│   └── log_classifier_model.joblib   # Trained classifier
├── training/
│   ├── rare_sources.npy     # Sources routed to LLM
│   ├── dataset/
│   │   └── synthetic_logs.csv
│   └── training.ipynb       # Model training notebook
├── evaluation/
│   ├── logs.csv             # Test input
│   └── classified_logs.csv  # Test output
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── render.yaml              # One-click Render deployment
└── .env.example

Quick Start

1. Clone & set up environment

git clone <your-repo-url>
cd Log-Classification-System

pip install -r requirements.txt

2. Configure environment

cp .env.example .env
# Edit .env and add your GROQ_API_KEY

Get a free API key at console.groq.com.

3. Run the server

uvicorn main:app --reload --port 8000

Open http://localhost:8000 for the web UI, or http://localhost:8000/docs for the interactive API docs.

API Reference

`POST /classify/single`

Classify one log entry.

curl -X POST http://localhost:8000/classify/single \
  -H "Content-Type: application/json" \
  -d '{"source": "ModernCRM", "log_message": "IP 192.168.1.1 blocked due to potential attack"}'

Response:

{
  "source": "ModernCRM",
  "log_message": "IP 192.168.1.1 blocked due to potential attack",
  "label": "Security Alert"
}

`POST /classify`

Classify a batch of logs.

curl -X POST http://localhost:8000/classify \
  -H "Content-Type: application/json" \
  -d '{
    "logs": [
      {"source": "BillingSystem", "log_message": "User User12345 logged in."},
      {"source": "LegacyCRM", "log_message": "Invoice generation aborted for order ID 8910."}
    ]
  }'

`POST /classify/csv`

Upload a CSV → download classified CSV.

curl -X POST http://localhost:8000/classify/csv \
  -F "file=@evaluation/logs.csv" \
  --output classified.csv

CSV must have source and log_message columns.

Deploy to Render (Free)

Push this repo to GitHub
Go to render.com → New → Web Service
Connect your repo — Render detects render.yaml automatically
Add environment variable: GROQ_API_KEY=your_key_here
Click Deploy — your API will be live at https://your-app.onrender.com

Deploy with Docker

docker compose up --build

The service will be available at http://localhost:8000.

Training Your Own Model

Open training/training.ipynb in Jupyter. The notebook:

Loads training/dataset/synthetic_logs.csv
Generates sentence embeddings with all-MiniLM-L6-v2
Trains a scikit-learn classifier
Saves the model to models/log_classifier_model.joblib
Identifies rare sources and saves them to training/rare_sources.npy

Tech Stack

Component	Technology
API Framework	FastAPI + Uvicorn
Tier 1 Classifier	Python `re` (regex)
Tier 2 Classifier	Sentence Transformers + scikit-learn
Tier 3 Classifier	Groq API (Llama 3.3 70B)
Embeddings Model	`all-MiniLM-L6-v2`
Frontend	Vanilla HTML/CSS/JS
Containerisation	Docker + Docker Compose

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log Classification System

Architecture

Classification Labels

Project Structure

Quick Start

1. Clone & set up environment

2. Configure environment

3. Run the server

API Reference

`POST /classify/single`

`POST /classify`

`POST /classify/csv`

Deploy to Render (Free)

Deploy with Docker

Training Your Own Model

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
evaluation		evaluation
models		models
static		static
training		training
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
classify.py		classify.py
docker-compose.yml		docker-compose.yml
main.py		main.py
processor_bert.py		processor_bert.py
processor_llm.py		processor_llm.py
processor_regex.py		processor_regex.py
render.yaml		render.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Log Classification System

Architecture

Classification Labels

Project Structure

Quick Start

1. Clone & set up environment

2. Configure environment

3. Run the server

API Reference

POST /classify/single

POST /classify

POST /classify/csv

Deploy to Render (Free)

Deploy with Docker

Training Your Own Model

Tech Stack

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /classify/single`

`POST /classify`

`POST /classify/csv`

Packages