This file provides guidance to WARP (warp.dev) when working with code in this repository.
Six Degrees of Wikipedia finds the shortest path between any two Wikipedia pages through their hyperlinks. The project consists of:
- Backend: Python Flask API server (
sdow/) with SQLite database containing Wikipedia links - Frontend: React/TypeScript website (
website/) built with Vite - Database: Scripts to download, process, and build Wikipedia dump databases (
scripts/) - Infrastructure: Configuration for production deployment with Nginx, Gunicorn, Supervisord
# From repo root - create Python virtual environment and install dependencies
virtualenv env
source env/bin/activate
pip install -r requirements.txt
# Create mock database for development
python scripts/create_mock_databases.py
# Setup frontend dependencies
cd website/
npm installBackend (Terminal 1):
# From repo root
source env/bin/activate
cd sdow/
export FLASK_APP=server.py FLASK_DEBUG=1
flask run
# Runs on http://localhost:5000Frontend (Terminal 2):
# From website/ directory
npm start
# Runs on http://localhost:3000# Run server in debug mode
cd sdow/ && export FLASK_APP=server.py FLASK_DEBUG=1 && flask run
# Check Python code formatting
pylint sdow/
# Format Python code (uses PEP 8 with 2-space indents, 100-char lines)
autopep8 --in-place --recursive sdow/
# Query mock database directly
litecli sdow/sdow.sqlite# From website/ directory
npm start # Development server (port 3000)
npm run build # Production build
npm run preview # Preview production build
npm run lint # Run Prettier + ESLint
npm run format # Auto-format code
npm run analyze # Analyze bundle size
npm run update-deps # Update dependencies (excludes react-router-dom)
# Type checking
npx tsc --noEmit# Create mock development databases
python scripts/create_mock_databases.py
# Build full production database from Wikipedia dumps (takes hours/days)
cd scripts/ && ./buildDatabase.sh
# Build for specific Wikipedia dump date
cd scripts/ && ./buildDatabase.sh 20231201
# Upload database to Google Cloud Storage
cd scripts/ && ./uploadToGcs.sh
# Backup searches database
cd scripts/ && ./backupSearchesDatabase.shThe core search algorithm (sdow/breadth_first_search.py) uses bi-directional BFS:
- Dual Search: Searches simultaneously from source and target pages
- Adaptive Direction: Chooses search direction based on link count (fewer outgoing/incoming links)
- Optimal Strategy: Forward search follows outgoing links, backward search follows incoming links
- Path Construction: When searches meet, reconstructs complete path through parent tracking
- pages:
id,title,is_redirect- All Wikipedia pages - links:
id,outgoing_links_count,incoming_links_count,outgoing_links,incoming_links- Page link relationships stored as pipe-separated strings - redirects:
source_id,target_id- Page redirects - searches: Search result logging with timing data
POST /paths- Main search endpoint:{"source": "Page A", "target": "Page B"}GET /ok- Health check
- Frontend sends search request to
/pathswith source/target page titles - Backend resolves titles to page IDs, handling redirects
- Bi-directional BFS algorithm finds shortest paths
- Wikipedia API fetched for page metadata (titles, URLs, summaries)
- Results returned as JSON with paths (page ID arrays) and page data
- Frontend renders results as both list and D3.js graph visualization
server.py- Flask app with CORS, compression, error handlingdatabase.py- SQLite query abstraction and cachingbreadth_first_search.py- Core pathfinding algorithmhelpers.py- Wikipedia API integration, error classes
The frontend is a separate React/TypeScript application with its own WARP.md file containing detailed frontend-specific guidance.
Wikipedia dump processing involves multiple stages:
- Download: Gets latest Wikipedia dumps (pages, links, redirects) via wget/torrent
- Trim: Filters to main namespace (0) articles only
- Transform: Converts titles to IDs, resolves redirects
- Sort/Dedupe: Optimizes link data for search performance
- Import: Creates SQLite database with proper indexes
- Web Server: Nginx reverse proxy
- App Server: Gunicorn WSGI server running Flask app
- Process Management: Supervisord for service monitoring
- Database: SQLite files (~50GB+ for full Wikipedia)
- Deployment: Google Cloud Platform with Firebase hosting for frontend
- 2-space indentation (configured in .pylintrc, setup.cfg)
- 100-120 character line limits
- PEP 8 compliance with custom indent rules
- Custom
InvalidRequestexception class for user-facing errors - Comprehensive logging with Google Cloud Logging integration
- Graceful degradation when Wikipedia API unavailable
- Link Storage: Pipe-separated strings in TEXT fields for space efficiency
- Search Optimization: Chooses BFS direction based on link density
- Database Indexes: Optimized for title lookups and link count queries
- Caching: Database connection pooling and query result caching
Development: Uses create_mock_databases.py - creates ~35 mock Wikipedia pages with simple link relationships for testing the search algorithm.
Production: Uses buildDatabase.sh - downloads and processes full Wikipedia dumps (~6M+ pages, ~150M+ links). Takes significant time and disk space (50GB+).