Welcome to the Spice.ai OSS Cookbook—a comprehensive collection of recipes for building and deploying data & AI applications using Spice.ai. Each recipe is a self-contained example that demonstrates a specific use case, integration, or feature of Spice.ai, helping you accelerate your data and AI projects.
- Real-time Data Access Pattern Analysis - Use AI to analyze query patterns and detect potential security risks.
- Federated SQL Query - Query data from S3, PostgreSQL, and Dremio in a single query.
- Cayenne Data Accelerator
- Async Queries - Submit long-running SQL queries and retrieve results asynchronously.
- Hybrid-Search - Combine keyword and vector search for improved retrieval.
- AI SQL Function - Use the
ai()SQL function to invoke LLMs directly in SQL queries for text generation, sentiment analysis, and data enrichment.
- Command Query Responsibility Segregation (CQRS) - Sample application implementing the CQRS pattern with Spice.
- AI SQL Function - Invoke LLMs directly in SQL queries for text generation and data enrichment.
- Azure OpenAI Models - Use Azure OpenAI for search and chat.
- Generative Visualizations - Generate SQL queries and visualizations from natural language.
- Running Llama3 Locally - Run Llama models locally from HuggingFace.
- OpenAI Models - Use OpenAI LLM and embedding models.
- OpenAI SDK - Use the OpenAI SDK to connect to models hosted on Spice.
- LLM Memory - Persistent memory for language models.
- Text to SQL (Tools) - Query data with natural language.
- Nvidia NIM on Kubernetes - Deploy Nvidia NIM on Kubernetes with GPUs.
- Nvidia NIM on AWS EC2 - Deploy Nvidia NIM on AWS GPU-optimized EC2 instances.
- Searching GitHub Files - Search GitHub files with embeddings and vector search.
- xAI Models - Use xAI models such as Grok.
- DeepSeek Model - Use DeepSeek model through Spice.
- Filesystem Hosted Model - Use models hosted directly on filesystems.
- Web Search Tools using Perplexity - Give LLMs web search access via Perplexity.
- Language Model Evaluations - Use Spice to evaluate language models.
- LLM as a Judge - Define LLM judge models to evaluate other models.
- OpenAI Responses API - Use OpenAI's Responses API with Spice
- Model Context Protocol (MCP) - Connect to MCP servers and use MCP tools with Spice.
- Cayenne Data Accelerator - Accelerate data using Cayenne.
- DuckDB Data Accelerator - Accelerate data using DuckDB.
- Hashed Partitioning with DuckDB - Prune data with hashed partitioning on categorical columns.
- PostgreSQL Data Accelerator - Materialize data into an attached PostgreSQL instance.
- SQLite Data Accelerator - Accelerate data using SQLite.
- Database Snapshots - Bootstrap accelerations from object storage to skip cold starts.
- Apache Arrow Data Accelerator - Accelerate data using in-memory Arrow.
- Accelerated Views - Pre-calculate and materialize derived data for faster queries.
- Dataset Partitioning - Partition accelerated datasets to improve query performance.
- Sales BI (Apache Superset) - Visualize data in Spice with Apache Superset.
- Grafana Datasource - Add Spice as a Grafana datasource.
- Python ADBC Client - Query Spice using ADBC with Python.
- Java JDBC Client - Query Spice using JDBC with Java.
- Scala JDBC Client - Query Spice using JDBC with Scala.
- Postgres Data Connector
- MySQL Data Connector
- Clickhouse Data Connector - Connect to ClickHouse as a data source.
- Databricks Connector - Delta Lake and Spark Connect.
- Delta Lake Connector - Query data from Delta Lake tables.
- Debezium CDC Data Connector - Stream changes from Postgres to Spice.
- Debezium CDC SASL/SCRAM from MySQL - Stream changes from MySQL using SASL/SCRAM.
- DynamoDB Data Connector - Query data from an AWS-hosted DynamoDB table.
- DynamoDB Streams - Stream real-time changes from DynamoDB tables.
- Dremio Data Connector - Connect to a Dremio instance.
- DuckDB Data Connector - Use a DuckDB database with sample TPCH data.
- File Data Connector - Query data from local files.
- FTP Data Connector - Query data from an FTP server.
- Glue Data Connector - Query tables in an AWS Glue Data Catalog.
- GitHub Data Connector - Query GitHub repository data.
- GraphQL Data Connector - Connect to GraphQL endpoints.
- HTTP Data Connector - Query data from HTTP(s) endpoints like REST APIs.
- MongoDB Data Connector - Connect to MongoDB as a data source.
- MSSQL (Microsoft SQL Server) Data Connector - Query across multiple SQL Server instances.
- ODBC Data Connector - Connect to databases via ODBC.
- Amazon Redshift - Read and write TPC-H data with Amazon Redshift.
- Oracle Data Connector - Connect to and accelerate data from Oracle.
- S3 Data Connector - Query data from an S3 bucket.
- ScyllaDB Data Connector - Query data from ScyllaDB clusters using federated SQL.
- SharePoint/OneDrive for Business Data Connector - Query documents in SharePoint.
- SMB Data Connector - Query data files from SMB/CIFS network shares.
- Snowflake Data Connector - Access a Snowflake database.
- Spice.ai Cloud Platform Data Connector - Connect to Spice.ai Cloud Platform datasets.
- Apache Spark Data Connector - Read data from an Apache Spark instance.
- Apache Kafka Data Connector - Stream data from Kafka with federated queries.
- IMAP Data Connector - Connect to an IMAP email server.
- Spice.ai Cloud Platform Catalog Connector - Query datasets in Spice.ai Cloud Platform.
- Databricks Unity Catalog Connector - Query Databricks Unity Catalog tables.
- Unity Catalog Connector - Query an open-source Unity Catalog instance.
- Iceberg Catalog Connector - Query and write to Iceberg tables.
- Iceberg Hadoop Catalog Connector - Connect to Hadoop catalogs on S3-compatible storage.
- Glue Catalog Connector - Query tables in an AWS Glue Data Catalog.
- Amazon S3 Vectors - Use S3 as a vector engine for embeddings and similarity search.
- Hybrid-Search - Combine keyword and vector search for improved retrieval.
- Full-Text Search - Retrieve records matching keywords using BM25 scoring.
- Deploying to Kubernetes
- Running in Docker
- Sidecar Deployment Architecture
- Microservice Deployment Architecture
- TPC-H Benchmarking - Run TPC-H benchmark queries.
- SQL Results Caching - Cache query results in memory for faster repeated queries.
- Caching Accelerator - HTTP response caching with SWR support.
- Indexes on Accelerated Data - Create indexes to improve query performance.
- Data Retention Policy - Evict data older than a specified duration.
- Refresh Data Window - Filter data refresh to only recent data.
- Advanced Data Refresh - Configure and tune data refresh for accelerated datasets.
- Data Quality with Constraints - Enforce data quality constraints on accelerated datasets.
- Rust SDK
- Python SDK
- Go SDK
- JavaScript SDK (Node.js) - Query data using the
@spiceai/spicenpm package. - Java SDK
- Local dataset replication - Link datasets in a parent/child relationship.
- Distributed Query - Run queries distributed across multiple nodes.
- JSON Strings - Work with JSON strings using JSON functions.