Spice.ai OSS
What is Spice?β
Spice is a portable runtime written in Rust that offers developers a unified SQL interface to materialize, accelerate, and query data from any database, data warehouse, or data lake.
π£ Read the Spice.ai OSS announcement blog post.
Spice connects, fuses, and delivers data to applications, machine-learning models, and AI-backends, functioning as an application-specific, tier-optimized Database CDN.
Spice is built-with industry leading technologies such as Apache DataFusion, Apache Arrow, Apache Arrow Flight, SQLite, and DuckDB.
Why Spice?β
Spice makes it fast and easy to query data from one or more sources using SQL. You can co-locate a managed dataset with your application or machine learning model, and accelerate it with Arrow in-memory, SQLite/DuckDB, or with attached PostgreSQL for fast, high-concurrency, low-latency queries. Accelerated engines give you flexibility and control over query cost and performance.
How is Spice different?β
-
Application-focused: Spice is designed to integrate at the application level; 1:1 or 1:N application to Spice mapping, whereas most other data systems are designed for multiple applications to share a single database or data warehouse. It's not uncommon to have many Spice instances, even down to one for each tenant or customer.
-
Dual-Engine Acceleration: Spice supports both OLAP (Arrow/DuckDB) and OLTP (SQLite/PostgreSQL) databases at the dataset level, unlike other systems that only support one type.
-
Separation of Materialization and Storage/Compute: Spice separates storage and compute, allowing you to keep data close to its source and bring a materialized working set next to your application, dashboard, or data/ML pipeline.
-
Edge to Cloud Native. Spice is designed to be deployed anywhere, from a standalone instance to a Kubernetes container sidecar, microservice, or cluster at the Edge/POP, On-Prem, or in public clouds. You can also chain Spice instances and deploy them across multiple infrastructure tiers.
How does Spice compare?β
Spice | Trino/Presto | Dremio | Clickhouse | |
---|---|---|---|---|
Primary Use-Case | Data & AI Applications | Big Data Analytics | Interactive Analytics | Real-Time Analytics |
Typical Deployment | Colocated with application | Cloud Cluster | Cloud Cluster | On-Prem/Cloud Cluster |
Application-to-Data System | One-to-One/Many | Many-to-One | Many-to-One | Many-to-One |
Query Federation | Native with query push-down | Supported with push-down | Supported with limited push-down | Limited |
Materialization | Arrow/SQLite/DuckDB/PostgreSQL | Intermediate Storage | Reflections (Iceberg) | Views & MergeTree |
Query Result Caching | Supported | Supported | Supported | Supported |
Typical Configuration | Single-Binary/Sidecar/Microservice | Coodinator+Executor w/ Zookeeper | Coodinator+Executor w/ Zookeeper | Clickhouse Keeper+Nodes |
Example Use-Casesβ
1. Faster applications and frontends. Accelerate and co-locate datasets with applications and frontends, to serve more concurrent queries and users with faster page loads and data updates. Try the CQRS Cookbook Recipe
2. Faster dashboards, analytics, and BI. Faster, more responsive dashboards without massive compute costs. Watch the Apache Superset demo
3. Faster data pipelines, machine learning training and inferencing. Co-locate datasets in pipelines where the data is needed to minimize data-movement and improve query performance. Predict hard drive failure with the SMART data demo
4. Easily query many data sources. Federated SQL query across databases, data warehouses, and data lakes using Data Connectors.
FAQβ
-
Is Spice a cache? No, however you can think of Spice data materialization like an active cache or data prefetcher. A cache would fetch data on a cache-miss while Spice prefetches and materializes filtered data on an interval or as new data becomes available. In addition to materialization Spice supports results caching.
-
Is Spice a CDN for databases? Yes, you can think of Spice like a CDN for different data sources. Using CDN concepts, Spice enables you to ship (load) a working set of your database (or data lake, or data warehouse) where it's most frequently accessed, like from a data application or for AI-inference.
Intelligent Applicationsβ
Spice enables developers to build both data and AI-driven applications by co-locating data and ML models with applications. Read more about the vision to enable the development of intelligent AI-driven applications.
Connect with usβ
We greatly appreciate and value your support! You can help Spice in a number of ways:
- βοΈ Star this repo.
- Build an app with Spice and send us feedback and suggestions at hey@spice.ai or on Discord, X, or LinkedIn.
- File an issue if you see something not quite working correctly.
- Join our team (Weβre hiring!)
- Contribute code or documentation to the project (see CONTRIBUTING.md).
Weβre also starting a community call series soon!
Thank you for sharing this journey with us. π