Oracle Text-to-SQL Generator

An AI-powered tool that converts natural language queries into Oracle SQL statements, with advanced features for schema understanding and query optimization.

Features

Natural language to SQL conversion using Claude Sonnet
Intelligent schema understanding and relationship mapping
Support for Oracle-specific features:
- Window functions
- Analytic functions
- Advanced analytics (MODEL clause, pattern matching)
Query optimization and performance suggestions
Detailed query explanations
Flexible table filtering and schema management

Prerequisites

Python 3.8+
Oracle Database
Anthropic API key
Oracle Client libraries

Installation

Clone the repository:

git clone https://github.com/yourusername/oracle-text-to-sql.git
cd oracle-text-to-sql

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create a .env file with your configuration:

# Oracle Database
ORACLE_USER=your_username
ORACLE_PASSWORD=your_password
ORACLE_HOST=your_host
ORACLE_PORT=1521
ORACLE_SERVICE=your_service

# Anthropic
ANTHROPIC_API_KEY=your_api_key

# Model Configuration
CLAUDE_MODEL=claude-3-sonnet-20240229

Usage

Initialize the TextToSQLGenerator:

from sql_generator import TextToSQLGenerator
from schema_manager import SchemaManager
import oracledb

# Create database connection
connection = oracledb.connect(
    user=os.getenv('ORACLE_USER'),
    password=os.getenv('ORACLE_PASSWORD'),
    host=os.getenv('ORACLE_HOST'),
    port=os.getenv('ORACLE_PORT'),
    service_name=os.getenv('ORACLE_SERVICE')
)

# Initialize with specific tables
schema_manager = SchemaManager(connection, use_specific_tables=True)

# Create generator
generator = TextToSQLGenerator(connection)

Generate SQL from natural language:

query = "Show me the top 5 employees by salary in each department"
result = generator.generate_sql(query)
print(result['sql'])
print(result['explanation'])

Project Structure

sql_generator.py: Main SQL generation logic
schema_manager.py: Database schema management
llm_manager.py: LLM integration and query analysis
api.py: FastAPI endpoints
main.py: Application entry point

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Anthropic for Claude API
Oracle for database features
FastAPI for the web framework

Project files overview

main.py: Entry point. Loads environment, starts Uvicorn, serves FastAPI app from api.py.
api.py: REST API. Endpoints to generate SQL, validate/execute, get optimizations, fetch/export schema, and retrieve learning data. Creates a single TextToSQLGenerator instance.
sql_generator.py: Core orchestration. Connects to Oracle, builds prompts, calls LLM for SQL, validates via EXPLAIN PLAN, optionally executes queries, suggests optimizations, and exposes helpers (hints, stored procedures, templates).
schema_manager.py: Reads schema from Oracle. Loads tables, columns, PK/FK, relationships, stats; supports filtering by tables/pattern; formats schema for LLM; exports schema to JSON.
schema_ai.py: AI understanding of schema. Embeddings/classification to map business terms to tables/columns and identify relationships; used to find relevant tables/entities for a query.
llm_manager.py: Interface to Claude Sonnet. Generates SQL from prompts, analyzes queries (JSON output), and provides SQL optimization suggestions; includes prompt templates (CTEs, window/analytic functions, advanced Oracle features).
learning_manager.py: Learning and adaptation. Tracks query history, similar queries via embeddings, uncertain cases, feedback incorporation, pattern extraction, and query template generation; basic optimization helper.

If you plan to only fetch schema from your DB:

Run main.py, then call GET /schema or GET /schema/export.

sql_cli.py python sql_cli.py --query "Top 10 customers by total sales last year" --tables OBJ_BP OBJ_KEY --pattern bp --execute

Pattern + specific tables: python schema_manager.py --tables OBJ_BP OBJ_KEY --pattern bp --export schema_export.json Optional stats: python schema_manager.py --tables OBJ_BP OBJ_KEY --pattern bp --stats All tables in the DB: python schema_manager.py --export schema_export.json Optional stats: python schema_manager.py --stats

When you run: python schema_manager.py --export schema_export.json The CLI creates SchemaManager(connection) with no table_filter and use_specific_tables=False. SchemaManager.init calls initialize_schema(), which queries all user tables: SELECT table_name FROM user_tables WHERE table_name NOT LIKE 'BIN$%' Because there’s no table_filter, it loads every table owned by the connected user. For the export, save_schema_to_file() calls get_complete_schema() which independently fetches ALL tables (again) via: SELECT table_name, comments FROM user_tab_comments WHERE table_type = 'TABLE' Then pulls columns (user_tab_columns), PKs and FKs (all_constraints/all_cons_columns) for each table. So the export always includes all tables unless you explicitly filter. If you want to limit the export, run: By pattern and specific list: python schema_manager.py --tables OBJ_BP OBJ_KEY --pattern bp --export schema_export.json By pattern only: python schema_manager.py --pattern bp --export schema_export.json\

Purpose of `schema_ai.py`

Builds an AI-understanding of your database schema so natural language queries can be mapped to the right tables/columns.
Provides semantic search over schema entities and extracts business-like terms.

What it loads/initializes

Vertex AI Text Embeddings (textembedding-gecko@latest) for semantic vectors.
BERT-based classifier (via HuggingFace) and tokenizer for NLP tasks.
Logging and environment via .env.

Core data structures

SchemaEntity: represents a table or column with description, embedding, and related entities.
Internal maps:
- entities: name → SchemaEntity
- entity_clusters: cluster_id → [entity names]
- business_term_mappings: discovered domain terms → metadata

Key methods and flow

analyze_schema(schema_json):
- Iterates exported JSON tables.
- Creates entities for each table and its columns.
- Generates embeddings for descriptions.
- Links columns back to their table as related entities.
cluster_entities(threshold):
- Cosine-similarity over embeddings to group similar entities.
generate_business_terms():
- From clusters, extracts common terms to propose business terminology and related tables/columns.
understand_query(query):
- Embeds the query and computes cosine similarity with all entity embeddings.
- Returns top-matched entities and extracted key terms.
extract_meaningful_terms(text):
- Simple NLP cleaning + stop-word removal to keep meaningful tokens.
save_analysis(path):
- Persists entities, clusters, and generated business terms to JSON.

How it’s used

During startup (API or CLI), schema JSON is analyzed to build a semantic map.
For a user query, understand_query returns relevant entities which are used to:
- Pick relevant tables for prompts
- Provide schema context to the LLM for SQL generation

If you want, I can add:

A method to load entities back from a saved analysis to avoid recomputation.
A filter to restrict relevance to a specific table subset at runtime.

Purpose of `llm_manager.py`

Abstraction over the LLM (Claude Sonnet) used for:
- Generating Oracle SQL from a prompt
- Analyzing a natural-language query into a structured JSON (intent, entities, window/analytic functions, CTEs, complexity)
- Providing SQL optimization suggestions

What it initializes/handles

Model setup and configuration (API keys via env).
Safe call wrapper: retries, basic rate limiting, error logging.
Prompt templates optimized for Oracle:
- Encourages ANSI joins, CTEs, window/analytic functions, advanced features (MODEL, MATCH_RECOGNIZE, PIVOT/UNPIVOT), Oracle-specific syntax.

Key methods

generate_sql(prompt: str) -> str
- Sends a carefully engineered prompt to Claude Sonnet and returns only the SQL text.
analyze_query(query: str) -> Dict[str, Any]
- Produces a structured JSON analysis (intent, entities, conditions, aggregations, window functions, analytic functions, advanced features, complexity/perf hints).
optimize_sql(sql: str) -> Dict[str, Any]
- Asks the LLM for optimization guidance (indexes, CTE refactors, pushdowns, window function tuning).

How it’s used in the app

sql_generator.py: Builds a prompt from schema/context and calls llm_manager.generate_sql() to get the SQL.
local_cli.py: Directly builds a prompt from the exported JSON + SchemaAI and calls generate_sql() for offline generation.
Optionally, analyze_query() can be used before generation to enforce structure and better constraints.

If you want, I can:

Expose a strict “SQL-only” generation guardrail (regex post-filtering).
Add a “SQL + explanation” mode.
Add temperature/top_p controls via CLI/API flags.

What `learning_manager.py` does

Learns from usage and feedback to improve future SQL generation.

Key capabilities:

Similar query search: stores past queries with embeddings; finds nearest matches for new queries.
Pattern recognition: extracts generalized templates from successful queries (e.g., common joins/filters) to guide prompt building.
Feedback incorporation: records user feedback (correct/incorrect, corrected SQL) and updates patterns/templates accordingly.
Uncertain queries tracking: logs low-confidence or error-prone cases for review.
Domain terms evolution: updates internal domain terminology stats (usage counts, confidence) from observed queries and feedback.
Optimization hints: analyzes execution outcomes/metadata to suggest improvements (e.g., indexing opportunities, join order, pushdowns).

Typical flow:

After generation/execution, you call analyze_feedback(...) with the outcome.
It updates:
- Successful pattern store
- Query templates
- Domain term stats
- History/embeddings for similarity search
Next queries benefit from:
- Better similar examples
- Stronger templates
- More accurate term mapping
- Optimization suggestions

Outputs used by others:

sql_generator.py fetches similar queries/templates to enrich prompts.
API endpoints expose uncertain queries, templates, and domain terms for monitoring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oracle Text-to-SQL Generator

Features

Prerequisites

Installation

Usage

Project Structure

Contributing

License

Acknowledgments

Project files overview

Purpose of `schema_ai.py`

What it loads/initializes

Core data structures

Key methods and flow

How it’s used

Purpose of `llm_manager.py`

What it initializes/handles

Key methods

How it’s used in the app

What `learning_manager.py` does

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
api.py		api.py
learning_manager.py		learning_manager.py
llm_manager.py		llm_manager.py
local_cli.py		local_cli.py
main.py		main.py
requirements.txt		requirements.txt
schema_ai.py		schema_ai.py
schema_manager.py		schema_manager.py
sql_cli.py		sql_cli.py
sql_generator.py		sql_generator.py

Folders and files

Latest commit

History

Repository files navigation

Oracle Text-to-SQL Generator

Features

Prerequisites

Installation

Usage

Project Structure

Contributing

License

Acknowledgments

Project files overview

Purpose of schema_ai.py

What it loads/initializes

Core data structures

Key methods and flow

How it’s used

Purpose of llm_manager.py

What it initializes/handles

Key methods

How it’s used in the app

What learning_manager.py does

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Purpose of `schema_ai.py`

Purpose of `llm_manager.py`

What `learning_manager.py` does

Packages