Skip to content

zinan92/intel

Repository files navigation

park-intel

Self-hosted market intelligence pipeline -- collect, enrich, and surface trading signals from 10+ sources

Python 3.11+ FastAPI React 18 Tests


What It Does

park-intel is a self-hosted market intelligence pipeline. It collects articles from 10+ source types (RSS, Hacker News, Reddit, GitHub, and more), enriches them with keyword tagging and optional LLM-based relevance scoring, clusters related articles into narrative events, and serves everything through a REST API with a feed-first frontend.

Core sources work out of the box with zero API keys. Optional sources (Xueqiu, LLM tagging) activate when you add their credentials.

Quick Start

git clone https://github.com/zinan92/intel.git park-intel
cd park-intel

# Backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env          # edit .env to add optional API keys
python main.py                # starts on http://localhost:8001

# Frontend
cd frontend && npm install && npm run dev
# open http://localhost:5174

The built-in scheduler starts collecting automatically. Check http://localhost:8001/health to see source status.

Core vs Optional Sources

Source Key Required Env Var Notes
RSS Feeds (50+) No -- Blogs, newsletters, tech and crypto media
Hacker News No -- Algolia API, score >= 20 filter
Reddit No -- 13 subreddits via RSS
GitHub Trending No -- Keyword-filtered trending repos
Yahoo Finance No -- Ticker news via yfinance
Google News No -- Query-driven news aggregation
GitHub Releases Optional GITHUB_TOKEN Increases API rate limit
Xueqiu (Chinese market) Yes XUEQIU_COOKIE Chinese market KOL commentary
Social KOL Optional -- Requires clawfeed CLI installed
LLM Tagging Optional ANTHROPIC_API_KEY AI relevance scoring + narrative tags

Without ANTHROPIC_API_KEY, articles still collect and get keyword tags -- they just won't have LLM-based relevance scores or narrative tags.

Architecture

Source Registry (DB)
       |
       v
   Adapters  -->  Collectors (fetch + dedup)  -->  SQLite
                                                      |
                        Keyword Tagger (13 categories) |
                        Ticker Extractor ($NVDA, etc.) |
                                                      v
                                              LLM Tagger (optional)
                                              relevance_score 1-5
                                              narrative_tags
                                                      |
                                                      v
                                          Event Aggregator (48h window)
                                          cross-source clustering
                                          signal scoring
                                                      |
                                                      v
                                              FastAPI REST API
                                             /api/* + /api/ui/*
                                                      |
                              +-----------+-----------+-----------+
                              |           |           |           |
                           React UI   Quant Bridge  User        Health
                           Feed +     price impact  profiles    Dashboard
                           Events     from ext.     topic       /health
                                      service       weights

Data flow: Sources are registered in a database table (not config files). The scheduler runs one job per source type. Collectors fetch, deduplicate, and auto-tag articles on ingest. An optional LLM tagger scores relevance and generates narrative labels. The event aggregator clusters articles sharing the same narrative tag within 48-hour windows, computing a signal score (source count x avg relevance).

Health Dashboard

The /health endpoint shows per-source status including last collection time, article counts, error rates, and volume anomalies. When running the frontend, navigate to the health page to see a visual overview.

Health Dashboard

Run as Background Service (macOS)

Optional: run park-intel as a persistent background service using launchd. The service auto-restarts on crash.

./scripts/install-service.sh    # installs LaunchAgent and starts the service
./scripts/service-status.sh     # check if the service is running
./scripts/uninstall-service.sh  # stop and remove the service

Logs go to the logs/ directory with automatic rotation.

API Endpoints

Core Data

Endpoint Description
GET /api/health Per-source health status (registry-driven)
GET /api/articles/latest Recent articles ?limit=20&source=rss&min_relevance=4
GET /api/articles/search Keyword search ?q=bitcoin&days=7
GET /api/articles/digest Articles grouped by source with top tags
GET /api/articles/signals Topic heat + narrative momentum ?hours=24
GET /api/articles/sources Historical source statistics

Frontend Read Model

Endpoint Description
GET /api/ui/feed Priority-scored feed ?user=myname&window=24h
GET /api/ui/items/{id} Article detail with related items
GET /api/ui/topics Topic list
GET /api/ui/sources Active source list
GET /api/ui/search Frontend search ?q=openai

Events

Endpoint Description
GET /api/events/active Active events ranked by signal score
GET /api/events/{id} Event detail with article timeline + price impacts
GET /api/events/history Closed events archive ?tag=btc&days=30

Users

Endpoint Description
POST /api/users Create user profile
GET /api/users/{username} Get user profile and topic weights
PUT /api/users/{username}/weights Update topic weights (0.0-3.0 per topic)

Development

# Run tests
pytest tests/

# Run in development mode (auto-reload on file changes)
PARK_INTEL_DEV=1 python main.py

# Run collectors manually
python scripts/run_collectors.py                # all sources
python scripts/run_collectors.py --source reddit # single source

# Run LLM tagger
python scripts/run_llm_tagger.py --limit 10     # score 10 unscored articles
python scripts/run_llm_tagger.py --backfill     # backfill historical articles

# Backfill ticker extraction
python scripts/backfill_tickers.py

Project Structure

main.py                  # FastAPI entry point (port 8001)
config.py                # Source seed data, collector config, env loading
scheduler.py             # Registry-driven APScheduler
sources/                 # Source registry, adapters, seeding
collectors/              # 10 source-type collectors (BaseCollector pattern)
events/                  # Event aggregation (48h clustering, narratives)
tagging/                 # Keyword tagger, LLM tagger, ticker extractor
users/                   # User profiles and topic weights
bridge/                  # Quant bridge (price impact from external service)
api/                     # REST API routes
db/                      # SQLAlchemy models, migrations, database init
frontend/                # React + TypeScript + Vite frontend
scripts/                 # Management and utility scripts
tests/                   # 290+ pytest tests

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Make your changes and add tests
  4. Run the test suite (pytest tests/)
  5. Commit and push (git push origin feature/my-feature)
  6. Open a Pull Request

License

MIT

Packages

 
 
 

Contributors