Twitch User Scraper collects structured Twitch channel ranking data from TwitchTracker-style ranking pages, turning messy tables into clean, analysis-ready records. Use it to build reliable Twitch user rankings datasets for research, dashboards, and competitive insights without manual copying.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for top-twitch-user-scrape you've just found your team — Let’s Chat. 👆👆
This project scrapes ranked Twitch channel data across multiple listing pages and outputs a consistent dataset per channel. It helps teams avoid manual export work and enables repeatable collection runs for trend tracking. It’s built for developers, analysts, and growth teams who need Twitch user rankings data for reporting, monitoring, or discovery workflows.
- Supports configurable pagination (start/end page) for large ranking runs
- Optional language filter to focus on a specific audience segment
- Extracts performance and growth metrics per channel in a consistent schema
- Designed for stable, repeatable collection suitable for automation pipelines
- Produces dataset-ready output for BI tools, spreadsheets, or databases
| Feature | Description |
|---|---|
| Multi-page pagination | Scrape large ranking ranges by setting a start and end page. |
| Language filtering | Narrow results to a specific language segment when available. |
| Structured dataset output | Produces consistent objects per channel for easy analysis. |
| Resilient crawling | Retries and continues across pages to reduce partial runs. |
| Proxy support | Helps reduce blocking risk and improves crawl stability at scale. |
| Local & deploy-ready workflow | Run locally for development and ship the same code to production. |
| Field Name | Field Description |
|---|---|
| rank | The channel’s position in the ranking list for the page. |
| name | The channel display name as shown in rankings. |
| channellink | Direct link to the channel profile/page. |
| avgViewers | Average viewers over the measured period. |
| hoursStreamed | Total hours streamed in the measured period. |
| peakViewers | Highest concurrent viewers observed in the measured period. |
| hoursWatched | Total hours watched across the measured period. |
| activeRank | Activity-based ranking indicator (when provided). |
| followersGained | Followers gained during the measured period. |
| followers | Total followers count at time of collection. |
| viewsGained | Views gained during the measured period (when provided). |
| page | The ranking page number where the record was captured. |
| language | Language filter value used for the run (or empty if none). |
| collectedAt | ISO timestamp of when the record was collected. |
[
{
"rank": 1,
"name": "example_channel",
"channellink": "https://www.twitch.tv/example_channel",
"avgViewers": 48215,
"hoursStreamed": 96.5,
"peakViewers": 132440,
"hoursWatched": 4651279,
"activeRank": 1,
"followersGained": 184320,
"followers": 12450123,
"viewsGained": 9123401,
"page": 1,
"language": "english",
"collectedAt": "2025-12-12T14:22:11.412Z"
}
]
top-twitch-user-scrape/
├── .actor/
│ ├── actor.json
│ └── input_schema.json
├── src/
│ ├── main.js
│ ├── routes/
│ │ └── rankings.js
│ ├── extractors/
│ │ ├── parseChannelRow.js
│ │ └── normalizeMetrics.js
│ ├── utils/
│ │ ├── buildUrls.js
│ │ ├── validators.js
│ │ └── time.js
│ └── constants/
│ └── defaults.js
├── storage/
│ ├── datasets/
│ └── key_value_stores/
├── tests/
│ ├── parseChannelRow.test.js
│ └── buildUrls.test.js
├── .gitignore
├── package.json
├── package-lock.json
├── README.md
└── LICENSE
- [Stream analytics teams] use it to collect Twitch user rankings daily, so they can spot viewership shifts and emerging channels early.
- [Marketing teams] use it to identify top Twitch streamers by language, so they can prioritize outreach and sponsorship targets.
- [Researchers] use it to build longitudinal datasets of channel performance, so they can analyze growth patterns and audience dynamics.
- [Growth operators] use it to track followers and views gained over time, so they can validate campaign impact against ranking movement.
- [Data engineers] use it to pipe structured ranking data into dashboards, so they can monitor KPIs without manual exports.
How do I control how much data gets scraped?
Set startPage and endPage to define the range you want. Smaller ranges are ideal for quick checks, while larger ranges are better for full-market snapshots.
Can I scrape only a specific language segment?
Yes. Provide the language input (e.g., "english"). If you leave it empty, the scraper collects across all available languages in the ranking pages.
Why do some fields look missing or empty in certain results?
Ranking pages can vary by category and availability of metrics. The scraper normalizes what it finds and will return null/empty values when a metric isn’t present for a specific row.
What should I do if pages start failing or returning partial results? Enable proxy configuration, reduce concurrency, and narrow the page range to validate stability. Once stable, increase the range gradually for large-scale runs.
Primary Metric: ~40–80 channels/minute on typical ranking pages when scraping 50 pages sequentially with lightweight HTML parsing.
Reliability Metric: 97–99% page success rate on stable networks with proxy enabled and conservative retries.
Efficiency Metric: Low memory footprint (typically under 200–350 MB) due to streaming extraction and minimal in-memory aggregation.
Quality Metric: 95%+ field completeness for core metrics (rank, name, link, followers) with graceful fallbacks for optional metrics that may not appear on every page.
