Skip to content

📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.

License

Notifications You must be signed in to change notification settings

html2rss/html2rss

Repository files navigation

html2rss logo

Gem Version Yard Docs Retro Badge: valid RSS CI

html2rss is a Ruby gem that generates RSS 2.0 feeds from websites by scraping HTML or JSON content with CSS selectors or auto-detection.

This gem is the core of the html2rss-web application.

🌐 Community & Resources

Resource Description Link
📚 Documentation & Feed Directory Complete guides, tutorials, and browse 100+ pre-built feeds html2rss.github.io
💬 Community Discussions Get help, share ideas, and connect with other users GitHub Discussions
📋 Project Board Track development progress and upcoming features View Project Board
💖 Support Development Help fund ongoing development and maintenance Sponsor on GitHub

Quick Start Options:

✨ Features

  • 🎯 CSS Selector Support - Extract content using familiar CSS selectors
  • 🤖 Auto-Detection - Automatically detect content using Schema.org, JSON state, and semantic HTML
  • 🔄 Multiple Request Strategies - Faraday for static sites, Browserless for JS-heavy sites
  • 🛠️ Post-Processing - Template rendering, HTML sanitization, time parsing, and more
  • 🧪 Comprehensive Testing - 95%+ test coverage with RSpec
  • 📚 Full Documentation - YARD documentation and comprehensive guides

🚀 Quick Start

For installation and usage instructions, please visit the project website.

💻 Try in Browser

You can develop html2rss directly in your browser using GitHub Codespaces:

Open in GitHub Codespaces

The Codespace comes pre-configured with Ruby 3.4 (compatible with Ruby 4.0), all dependencies, and VS Code extensions ready to go!

📚 Documentation

The full documentation for the html2rss gem is available on the project website.

🤝 Contributing

Please see the contributing guide for details on how to contribute.

🏗️ Architecture

Core Components

  1. Config - Loads and validates configuration (YAML/hash)
  2. RequestService - Fetches pages using Faraday or Browserless
  3. Selectors - Extracts content via CSS selectors with extractors/post-processors
  4. AutoSource - Auto-detects content using Schema.org, JSON state blobs, semantic HTML, and structural patterns
  5. RssBuilder - Assembles Article objects and renders RSS 2.0

Data Flow

Config -> Request -> Extraction -> Processing -> Building -> Output

🧪 Testing

  • RSpec for comprehensive testing
  • 95%+ code coverage with SimpleCov
  • VCR for HTTP interaction testing
  • RuboCop for code style enforcement
  • Reek for code smell detection

🔧 Development Tools

  • Ruby LSP for IntelliSense and language features
  • Debug for modern debugging and exploration
  • YARD for documentation generation
  • GitHub Actions for CI/CD

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

💖 Sponsoring

If you find html2rss useful, please consider sponsoring the project.