LFX Platform

Know more about LFX Platform

LFX Insights

Data Lakes & Lakehouses

Unified systems bridging raw data storage with ACID transactions, merges, and advanced analytics—sitting at the intersection of data lakes and traditional warehouses.

13 projects

20,103 contributors

$2B

Delta Lake Project

Delta Lake is an open source storage layer that brings reliability to data lakes.

Contributors

4,002

Organizations

591

Software value

$34M

Alluxio

Alluxio is a distributed system that enables data orchestration across different storage systems and computation frameworks. It provides a unified namespace and data access layer, improving performance through memory-centric architecture and intelligent caching while maintaining compatibility with existing applications.

Contributors

3,353

Organizations

258

Software value

$19M

TiDB

TiDB is an open-source, distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It features horizontal scalability, strong consistency, and MySQL compatibility.

Contributors

3,255

Organizations

555

Software value

$84M

Apache Hudi

Apache Hudi is a data lake platform that provides streaming data ingestion and bulk data management capabilities. It enables atomic updates, record-level change streams, and incremental data processing on large analytical datasets stored in data lakes. The platform supports ACID transactions, efficient upserts, and real-time analytics while maintaining data quality and consistency.

Contributors

3,058

Organizations

275

Software value

$24M

YTsaurus

YTsaurus is a distributed storage and processing platform designed for managing large-scale data. It provides a comprehensive suite of tools for data organization, processing, and analysis, supporting features like distributed execution, data replication, and resource management across clusters.

Contributors

1,551

Organizations

32

Software value

$1.6B

Apache Hive

Apache Hive is a data warehouse software project built on top of Apache Hadoop that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It provides a mechanism to project structure onto data and query it using HQL (Hive Query Language), a SQL-like language.

Contributors

1,501

Organizations

148

Software value

$96M

ByConity

ByConity is a cloud-native data warehouse system that provides real-time analytics capabilities with high concurrency and low latency. It features separation of storage and compute, elastic scaling, and ACID transaction support.

Contributors

1,364

Organizations

132

Software value

$66M

Apache Ozone

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

Contributors

691

Organizations

70

Software value

$29M

Databend

Databend is a modern cloud data warehouse that features a vectorized execution engine, cloud-native architecture, and Snowflake-compatible interface. It provides high performance analytics capabilities while maintaining cost efficiency through separation of storage and compute.

Contributors

677

Organizations

177

Software value

$30M

lakeFS

lakeFS - Data version control for your data lake | Git for data

Contributors

580

Organizations

122

Software value

$14M

LakeSoul

The mission of the Project is to develop an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Contributors

71

Organizations

6

Software value

$6.3M

StarRocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?