Skip to content

hsb943/llm-finetuning-qlora-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-Tuning Pipeline — Domain-Specific LLM (QLoRA)

1. Overview

This is an ongoing project and represents the fine-tuning layer of a larger, end-to-end LLM platform composed of multiple independent components.

The overall platform consists of:

  1. Infrastructure provisioning for hosting fine-tuned LLMs.
  2. Fine-tuning base language models using parameter-efficient methods.
  3. Deploying and serving fine-tuned models on that infrastructure.
  4. Agent-based applications consuming these models instead of external APIs (e.g., OpenAI).

This repository specifically focuses on synthetic dataset generation and parameter-efficient fine-tuning, and is designed to integrate with separate infrastructure, serving, and agent orchestration repositories.

This repository intentionally focuses only on:

  • Synthetic data generation
  • Instruction-style dataset construction
  • Fine-tuning configuration using QLoRA

Infrastructure provisioning, production serving, and agent orchestration are handled in separate repositories to maintain strict separation of responsibilities.


2. Objective

The objective of this project is to adapt a base 7B language model to a structured study and evaluation domain.

The fine-tuned model is intended to support:

  • Checkpoint generation
  • Question generation
  • Question answering
  • Study mastery evaluation

The resulting model will be consumed by an agent-based system instead of relying on external LLM APIs.


3. Repository Structure

FINE-TUNING/
│
├── dataset.jsonl
├── generate_training_samples.py
├── generate_notes.py
├── notes_macrocytic_anemia.md
├── topics.json
├── topics-openai.py
├── topics-vertexai.py
├── train_qlora.ipynb
└── fine-tuning.zip

File Descriptions

  • topics.json
    Defines the list of domain topics used for synthetic data generation.

  • topics-openai.py
    Generates structured content using OpenAI models.

  • topics-vertexai.py
    Generates structured content using Google Vertex AI.

  • generate_training_samples.py
    Converts structured outputs into instruction-style JSONL training data.

  • dataset.jsonl
    Aggregated fine-tuning dataset in instruction format.

  • train_qlora.ipynb
    Notebook containing QLoRA configuration and training logic.

  • generate_notes.py
    Generates structured markdown notes from model outputs.

  • notes_macrocytic_anemia.md
    Example of generated structured notes.


4. Fine-Tuning Strategy

Target Model: Qwen 2.5 7B (or compatible 7B transformer)
Adaptation Method: QLoRA (4-bit quantization + LoRA adapters)
Framework: Hugging Face Transformers + PEFT + bitsandbytes

Design considerations:

  • Parameter-efficient updates to minimize compute cost
  • 4-bit quantization for memory efficiency
  • Instruction-style supervised fine-tuning
  • Compatibility with constrained GPU environments

5. Dataset Generation Workflow

Step 1 — Topic Definition

Topics are defined in:

topics.json

Step 2 — Synthetic Content Generation

Structured content is generated using one of the following providers:

python topics-openai.py

or

python topics-vertexai.py

Generated content includes:

  • Checkpoints
  • Questions
  • Answers
  • Study-oriented explanations

Step 3 — Instruction Dataset Construction

python generate_training_samples.py

This produces a JSONL dataset with the following structure:

{
  "instruction": "...",
  "input": "...",
  "output": "..."
}

6. Current Status

  • Synthetic data generation pipeline: Ongoing
  • Instruction formatting logic: Implemented
  • QLoRA training configuration: Implemented
  • Dataset scaling and validation: Ongoing
  • Integration with serving layer: Planned

This repository is under active development as dataset quality and coverage continue to improve.


7. Planned Integration

Once fine-tuning is completed:

  1. LoRA adapters will be exported.
  2. Adapters will be loaded in the serving layer.
  3. The model will be deployed using an optimized inference server.
  4. Agent-based applications will consume the self-hosted model instead of external APIs.

8. Requirements

Create a requirements.txt file:

transformers
datasets
peft
bitsandbytes
accelerate
trl
torch
openai
google-cloud-aiplatform

Install dependencies:

pip install -r requirements.txt

9. Environment Variables

Create a .env file (not committed to version control):

OPENAI_API_KEY=
GOOGLE_APPLICATION_CREDENTIALS=

10. Design Principles

  • Strict separation between training, serving, and application layers
  • Parameter-efficient fine-tuning to reduce infrastructure cost
  • Modular architecture enabling independent iteration
  • Elimination of long-term dependency on third-party LLM APIs

11. Position in the Overall Platform

This repository represents the fine-tuning component of a four-part LLM platform:

  1. Infrastructure layer
  2. Fine-tuning layer (this repository)
  3. Model serving layer
  4. Agent application layer

Each layer is isolated into its own repository to ensure clarity, scalability, and production readiness.


This project is being developed as part of a modular, production-oriented LLM platform with full ownership over data, models, and deployment.

About

Parameter-efficient fine-tuning pipeline for domain-specific LLMs using QLoRA, synthetic data generation, and instruction tuning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors