A real-time "Shadow API" for websites that do not provide public APIs.
This project aims to provide a fast REST interface (target: < 2s response time) on top of non-API web platforms by running an always-warm Apify Actor with resilient scraping and caching.
Many high-value platforms (B2B marketplaces, regional portals, and private dashboards) either have no official API or strict access constraints.
Shadow API exposes carefully designed API endpoints over these sources so developers can integrate quickly without building and maintaining their own scraping stack.
- Real-time API responses from non-API sources
- Standby mode (warm browser/session pool) for low-latency requests
- Reliable extraction with retries and anti-blocking tactics
- REST-first interface with consistent schemas
- Apify deployment for scalable execution and monetization
- Median response time: under 2 seconds (for cached/hot paths)
- Fast failure and clear error contracts
- Observability for latency and extraction quality
- Subscription rental model:
$50–$100/month - Alternative usage model: pay-per-result
- Define MVP endpoint set and input/output schema
- Build core Actor runtime and warm standby architecture
- Add caching + request deduplication
- Add anti-blocking and extraction fallbacks
- Publish to Apify Store with pricing tiers
Initial P0 product artifacts are documented in docs/product/:
docs/product/icp-and-verticals.mddocs/product/demand-scorecard.mddocs/product/legal-risk-matrix.mddocs/product/mvp-endpoint-catalog.mddocs/product/response-schema-conventions.mddocs/product/slos-and-reliability-baseline.mddocs/product/pricing-hypotheses.mddocs/product/north-star-metrics-events.mddocs/product/demo-use-cases.mddocs/product/prd-v1.md
This project must be used in compliance with each target site's terms of service, local laws, and privacy requirements.
Security and compliance references:
docs/security/README.md
The initial Apify Actor scaffold is now in place:
- Actor config:
.actor/actor.json,.actor/input_schema.json - Runtime source:
src/main.ts,src/config.ts,src/server.ts - Warm runtime managers:
src/runtime/browser-pool.ts,src/runtime/standby-lifecycle.ts - Session persistence manager:
src/runtime/session-storage.ts - Queue + error + nav utilities:
src/runtime/request-queue.tssrc/runtime/errors.tssrc/runtime/navigation.ts
- Extraction modules (M3):
src/extraction/types.tssrc/extraction/service.tssrc/extraction/adapters/src/extraction/normalization.tssrc/extraction/selector-fallback.tssrc/extraction/pagination.tssrc/extraction/challenge-detection.tssrc/extraction/health-tracker.ts
- API contract modules (M4):
src/api/contracts.tssrc/api/envelope.tssrc/api/schema-validation.ts
- Performance modules (M5):
src/performance/cache-provider.tssrc/performance/response-cache.tssrc/performance/inflight-dedupe.tssrc/performance/fetch-pipeline.tssrc/performance/prewarm-scheduler.tssrc/performance/latency-metrics.ts
- Build/dev config:
package.json,tsconfig.json,.env.example,Dockerfile
- Install dependencies:
npm install - Start in dev mode:
npm run dev - Verify endpoints:
GET http://127.0.0.1:3000/v1/healthGET http://127.0.0.1:3000/v1/readyGET http://127.0.0.1:3000/v1/adapters/healthGET http://127.0.0.1:3000/v1/debug/performancePOST http://127.0.0.1:3000/v1/fetch
Startup now validates runtime config and fails fast with explicit errors when:
PORT/portis non-integer or outside1..65535HOST/hostis emptyLOG_LEVEL/logLevelis not one ofDEBUG|INFO|WARNING|ERROR- any variable listed in
REQUIRED_ENV_VARS(or actor inputrequiredEnvVars) is missing
Warm pool and standby controls:
BROWSER_POOL_ENABLED(true|false)BROWSER_POOL_SIZE(warm session count)BROWSER_HEADLESS(true|false)BROWSER_LAUNCH_TIMEOUT_MSSTANDBY_ENABLED(true|false)STANDBY_IDLE_TIMEOUT_MSSTANDBY_TICK_INTERVAL_MSSTANDBY_RECYCLE_AFTER_MSSESSION_STORAGE_ENABLED(true|false)SESSION_STORE_NAMESESSION_STORE_KEY_PREFIXREQUEST_QUEUE_CONCURRENCYREQUEST_QUEUE_MAX_SIZEREQUEST_QUEUE_TASK_TIMEOUT_MSFETCH_TIMEOUT_DEFAULT_MSFETCH_TIMEOUT_MIN_MSFETCH_TIMEOUT_MAX_MSREQUEST_BODY_MAX_BYTESAPI_KEY_ENABLEDAPI_KEYCACHE_PROVIDER(memory|redis)CACHE_TTL_MSCACHE_STALE_TTL_MSCACHE_SWR_ENABLEDREDIS_URL(required whenCACHE_PROVIDER=redis)REDIS_KEY_PREFIXFAST_MODE_ENABLEDFAST_MODE_MAX_FIELDSPREWARM_ENABLEDPREWARM_INTERVAL_MSPREWARM_TARGETS(JSON array of request objects)BROWSER_OPTIMIZED_FLAGS_ENABLEDBROWSER_BLOCK_RESOURCESSHUTDOWN_DRAIN_TIMEOUT_MSMOCK_FETCH_DELAY_MS
Note: when browser pool is enabled, a Playwright-compatible browser must be available in runtime. Session storage uses Apify Key-Value Store and restores browser storage state by warm-session slot.
npm run dev:runner— start local runtime and print health/ready/fetch sample outputnpm run smoke:local— run lightweight endpoint smoke check against a running servicenpm run debug:queue— fire concurrent fetch calls to observe queue/backpressure behaviornpm run verify:fixtures— validate adapter extraction outputs against selector fixturesnpm run verify:api-contract— validate auth + schema + envelope contract behaviornpm run generate:api-artifacts— generate OpenAPI and Postman artifacts from sourcenpm run benchmark:hot-path— run reproducible M5 latency benchmark and emit report
- Current prototype sources:
linkedin,x,discord - Supported operations:
linkedin:profilex:profilediscord:server_metadata
POST /v1/fetchacceptstarget.mockHtml(ortarget.html) for deterministic extraction tests.
- Standard response envelope for all endpoints:
ok,data,error,meta. - API key middleware supports
x-api-keyandAuthorization: Bearer. - Public endpoints when auth is enabled:
GET /v1/health,GET /v1/ready. - Contract artifacts:
docs/api/openapi.jsondocs/api/postman/shadow-api-mvp.postman_collection.jsondocs/api/error-codes.md
- Cache stack:
- In-memory hot cache with TTL
- Optional Redis cache provider with automatic fallback to memory
- Stale-while-revalidate mode for expired entries
- Request deduplication:
- Identical inflight
POST /v1/fetchcalls collapse to a single extraction execution
- Identical inflight
- Fast mode:
POST /v1/fetchsupportsfast_modefor partial responsesFAST_MODE_MAX_FIELDScaps returned fields in fast mode
- Prewarming:
- Scheduler can refresh configured targets using
PREWARM_TARGETS
- Scheduler can refresh configured targets using
- Performance telemetry:
GET /v1/debug/performancereturns cache, dedupe, latency, and prewarm metrics
- Benchmark artifact:
docs/performance/hot-path-benchmark.json(generated bynpm run benchmark:hot-path)
README.md— project overview and roadmapCONTRIBUTING.md— contribution guidelinesCODE_OF_CONDUCT.md— community standardsSECURITY.md— vulnerability reporting policyCHANGELOG.md— release historyLICENSE— project licensedocs/product/— P0 product planning and API contract artifactsdocs/api/— generated OpenAPI/Postman specs and API troubleshooting docsdocs/performance/— benchmark results and performance implementation notes.actor/— Apify Actor metadata and input schemasrc/— Actor runtime source scaffold