13 projects
Delta Lake Project
Delta Lake is an open source storage layer that brings reliability to data lakes.
4,002
591
$34M
Alluxio
Alluxio is a distributed system that enables data orchestration across different storage systems and computation frameworks. It provides a unified namespace and data access layer, improving performance through memory-centric architecture and intelligent caching while maintaining compatibility with existing applications.
3,353
258
$19M
TiDB
TiDB is an open-source, distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It features horizontal scalability, strong consistency, and MySQL compatibility.
3,255
555
$84M
Apache Hudi
Apache Hudi is a data lake platform that provides streaming data ingestion and bulk data management capabilities. It enables atomic updates, record-level change streams, and incremental data processing on large analytical datasets stored in data lakes. The platform supports ACID transactions, efficient upserts, and real-time analytics while maintaining data quality and consistency.
3,058
275
$24M
YTsaurus
YTsaurus is a distributed storage and processing platform designed for managing large-scale data. It provides a comprehensive suite of tools for data organization, processing, and analysis, supporting features like distributed execution, data replication, and resource management across clusters.
1,551
32
$1.6B
Apache Hive
Apache Hive is a data warehouse software project built on top of Apache Hadoop that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It provides a mechanism to project structure onto data and query it using HQL (Hive Query Language), a SQL-like language.
1,501
148
$96M
ByConity
ByConity is a cloud-native data warehouse system that provides real-time analytics capabilities with high concurrency and low latency. It features separation of storage and compute, elastic scaling, and ACID transaction support.
1,364
132
$66M
Apache Ozone
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
691
70
$29M
Databend
Databend is a modern cloud data warehouse that features a vectorized execution engine, cloud-native architecture, and Snowflake-compatible interface. It provides high performance analytics capabilities while maintaining cost efficiency through separation of storage and compute.
677
177
$30M
lakeFS
lakeFS - Data version control for your data lake | Git for data
580
122
$14M
LakeSoul
The mission of the Project is to develop an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
71
6
$6.3M
StarRocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.