WARP.md

This file provides guidance to WARP (warp.dev) when working with code in this repository.

Project Overview

Six Degrees of Wikipedia finds the shortest path between any two Wikipedia pages through their hyperlinks. The project consists of:

Backend: Python Flask API server (sdow/) with SQLite database containing Wikipedia links
Frontend: React/TypeScript website (website/) built with Vite
Database: Scripts to download, process, and build Wikipedia dump databases (scripts/)
Infrastructure: Configuration for production deployment with Nginx, Gunicorn, Supervisord

Development Environment Setup

Initial Setup (One-time)

# From repo root - create Python virtual environment and install dependencies
virtualenv env
source env/bin/activate
pip install -r requirements.txt

# Create mock database for development
python scripts/create_mock_databases.py

# Setup frontend dependencies
cd website/
npm install

Development Server (Every Session)

Backend (Terminal 1):

# From repo root
source env/bin/activate
cd sdow/
export FLASK_APP=server.py FLASK_DEBUG=1
flask run
# Runs on http://localhost:5000

Frontend (Terminal 2):

# From website/ directory  
npm start
# Runs on http://localhost:3000

Core Development Commands

Backend (Python Flask API)

# Run server in debug mode
cd sdow/ && export FLASK_APP=server.py FLASK_DEBUG=1 && flask run

# Check Python code formatting 
pylint sdow/

# Format Python code (uses PEP 8 with 2-space indents, 100-char lines)
autopep8 --in-place --recursive sdow/

# Query mock database directly
litecli sdow/sdow.sqlite

Frontend (React/TypeScript)

# From website/ directory
npm start              # Development server (port 3000)
npm run build          # Production build
npm run preview        # Preview production build
npm run lint           # Run Prettier + ESLint  
npm run format         # Auto-format code
npm run analyze        # Analyze bundle size
npm run update-deps    # Update dependencies (excludes react-router-dom)

# Type checking
npx tsc --noEmit

Database Operations

# Create mock development databases
python scripts/create_mock_databases.py

# Build full production database from Wikipedia dumps (takes hours/days)
cd scripts/ && ./buildDatabase.sh

# Build for specific Wikipedia dump date
cd scripts/ && ./buildDatabase.sh 20231201

# Upload database to Google Cloud Storage
cd scripts/ && ./uploadToGcs.sh

# Backup searches database
cd scripts/ && ./backupSearchesDatabase.sh

Architecture Overview

Bi-Directional Breadth-First Search Algorithm

The core search algorithm (sdow/breadth_first_search.py) uses bi-directional BFS:

Dual Search: Searches simultaneously from source and target pages
Adaptive Direction: Chooses search direction based on link count (fewer outgoing/incoming links)
Optimal Strategy: Forward search follows outgoing links, backward search follows incoming links
Path Construction: When searches meet, reconstructs complete path through parent tracking

Database Schema

pages: id, title, is_redirect - All Wikipedia pages
links: id, outgoing_links_count, incoming_links_count, outgoing_links, incoming_links - Page link relationships stored as pipe-separated strings
redirects: source_id, target_id - Page redirects
searches: Search result logging with timing data

API Endpoints

POST /paths - Main search endpoint: {"source": "Page A", "target": "Page B"}
GET /ok - Health check

Data Flow

Frontend sends search request to /paths with source/target page titles
Backend resolves titles to page IDs, handling redirects
Bi-directional BFS algorithm finds shortest paths
Wikipedia API fetched for page metadata (titles, URLs, summaries)
Results returned as JSON with paths (page ID arrays) and page data
Frontend renders results as both list and D3.js graph visualization

Key Files and Patterns

Backend Structure

server.py - Flask app with CORS, compression, error handling
database.py - SQLite query abstraction and caching
breadth_first_search.py - Core pathfinding algorithm
helpers.py - Wikipedia API integration, error classes

Frontend Structure (see website/WARP.md)

The frontend is a separate React/TypeScript application with its own WARP.md file containing detailed frontend-specific guidance.

Database Build Pipeline

Wikipedia dump processing involves multiple stages:

Download: Gets latest Wikipedia dumps (pages, links, redirects) via wget/torrent
Trim: Filters to main namespace (0) articles only
Transform: Converts titles to IDs, resolves redirects
Sort/Dedupe: Optimizes link data for search performance
Import: Creates SQLite database with proper indexes

Production Architecture

Web Server: Nginx reverse proxy
App Server: Gunicorn WSGI server running Flask app
Process Management: Supervisord for service monitoring
Database: SQLite files (~50GB+ for full Wikipedia)
Deployment: Google Cloud Platform with Firebase hosting for frontend

Development Patterns

Python Code Style

2-space indentation (configured in .pylintrc, setup.cfg)
100-120 character line limits
PEP 8 compliance with custom indent rules

Error Handling

Custom InvalidRequest exception class for user-facing errors
Comprehensive logging with Google Cloud Logging integration
Graceful degradation when Wikipedia API unavailable

Performance Considerations

Link Storage: Pipe-separated strings in TEXT fields for space efficiency
Search Optimization: Chooses BFS direction based on link density
Database Indexes: Optimized for title lookups and link count queries
Caching: Database connection pooling and query result caching

Mock vs Production Data

Development: Uses create_mock_databases.py - creates ~35 mock Wikipedia pages with simple link relationships for testing the search algorithm.

Production: Uses buildDatabase.sh - downloads and processes full Wikipedia dumps (~6M+ pages, ~150M+ links). Takes significant time and disk space (50GB+).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARP.md

Project Overview

Development Environment Setup

Initial Setup (One-time)

Development Server (Every Session)

Core Development Commands

Backend (Python Flask API)

Frontend (React/TypeScript)

Database Operations

Architecture Overview

Bi-Directional Breadth-First Search Algorithm

Database Schema

API Endpoints

Data Flow

Key Files and Patterns

Backend Structure

Frontend Structure (see website/WARP.md)

Database Build Pipeline

Production Architecture

Development Patterns

Python Code Style

Error Handling

Performance Considerations

Mock vs Production Data

FilesExpand file tree

WARP.md

Latest commit

History

WARP.md

File metadata and controls

WARP.md

Project Overview

Development Environment Setup

Initial Setup (One-time)

Development Server (Every Session)

Core Development Commands

Backend (Python Flask API)

Frontend (React/TypeScript)

Database Operations

Architecture Overview

Bi-Directional Breadth-First Search Algorithm

Database Schema

API Endpoints

Data Flow

Key Files and Patterns

Backend Structure

Frontend Structure (see website/WARP.md)

Database Build Pipeline

Production Architecture

Development Patterns

Python Code Style

Error Handling

Performance Considerations

Mock vs Production Data