21 projects
The Presto Foundation Fund
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.
5,381
749
$2B
Trino
Trino is a distributed SQL query engine designed to query large data sets distributed across multiple heterogeneous data sources. It enables fast, interactive analytics across diverse data sources including Hadoop, object stores, relational databases, and other systems.
5,117
724
$68M
Apache DataFusion
Apache DataFusion is a fast, extensible query execution framework written in Rust that enables efficient processing of large-scale data using SQL. It provides a modular architecture for building high-performance data processing systems and analytics applications, with support for various data sources and formats.
2,376
557
$21M
Apache Pinot
Apache Pinot is a real-time distributed OLAP datastore designed to deliver scalable real-time analytics with low latency. It can ingest data from batch and streaming sources and provides a SQL interface for querying. The system is built to handle high throughput analytics workloads and supports rich indexing capabilities for optimized query performance.
1,717
230
$47M
Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It provides a mechanism to project structure onto data and query it using HQL (Hive Query Language), a SQL-like language.
1,500
149
$96M
Apache Cloudberry
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
1,466
79
$126M
ByConity
ByConity is a cloud-native data warehouse system that provides real-time analytics capabilities with high concurrency and low latency. It features separation of storage and compute, elastic scaling, and ACID transaction support.
1,364
120
$66M
Apache Kyuubi
Apache Kyuubi is a distributed multi-tenant service that provides high-performance SQL query capabilities and resource management for big data workloads. It offers a unified gateway for accessing data lakes through various engines like Apache Spark, enabling secure, scalable, and highly available data processing.
972
108
$8.2M
Apache Gluten
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
667
68
$22M
Snuba
Snuba is a time-series storage and analytics database service designed to power Sentry's event storage and analytics features. It provides a low-latency query service built on top of ClickHouse, optimized for high-volume event data and real-time querying.
613
116
$7.2M
Apache Phoenix
Apache Phoenix
552
56
$19M
Apache Doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
Calcite Avatica
Apache Calcite Avatica
Cube.js
๐ Cube โ Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
Databend
๐๐ฎ๐๐ฎ, ๐๐ป๐ฎ๐น๐๐๐ถ๐ฐ๐ & ๐๐. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
Deephaven Community Core
Deephaven Community Core
Drill
Apache Drill is a distributed MPP query layer for self describing data
Linkis
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
ProtonSQL
High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka, Pulsar, or ClickHouse, and seamlessly write results back. Supports powerful features like JOIN, CDC, UPSERT, and LOOKUP, enabling real-time analytics and ETL at scale.
Spark
Apache Spark - A unified analytics engine for large-scale data processing