LFX Platform

Know more about LFX Platform

LFX Insights

Big Data Integration Platforms

Comprehensive frameworks that package, deploy, test, and maintain integrated big data ecosystems with multiple components working together. These platforms simplify the deployment and management of distributed big data technologies across clusters.

8 projects

25,199 contributors

$2.5B

OpenSearch

The purpose of the OpenSearch Software Foundation is to raise, budget and spend funds in support of various open source, open data and/or open standards projects relating to open source search and analysis solutions.

Contributors

14,614

Organizations

1,871

Software value

$571M

Apache Hudi

Apache Hudi is a data lake platform that provides streaming data ingestion and bulk data management capabilities. It enables atomic updates, record-level change streams, and incremental data processing on large analytical datasets stored in data lakes. The platform supports ACID transactions, efficient upserts, and real-time analytics while maintaining data quality and consistency.

Contributors

3,034

Organizations

270

Software value

$23M

Apache SeaTunnel

Apache SeaTunnel is a distributed data integration platform that enables high-performance data synchronization between various data sources and destinations. It provides a unified pipeline for real-time and batch data transfer, supporting multiple data systems like databases, messaging systems, and file storage, with features for data transformation and processing.

Contributors

2,465

Organizations

170

Software value

$20M

Apache Hadoop

Apache Hadoop is a distributed computing framework that enables processing and storage of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, with each offering local computation and storage.

Contributors

2,314

Organizations

274

Software value

$190M

YTsaurus

YTsaurus is a distributed storage and processing platform designed for managing large-scale data. It provides a comprehensive suite of tools for data organization, processing, and analysis, supporting features like distributed execution, data replication, and resource management across clusters.

Contributors

1,508

Organizations

29

Software value

$1.6B

Apache Kyuubi

Apache Kyuubi is a distributed multi-tenant service that provides high-performance SQL query capabilities and resource management for big data workloads. It offers a unified gateway for accessing data lakes through various engines like Apache Spark, enabling secure, scalable, and highly available data processing.

Contributors

974

Organizations

108

Software value

$8.2M

HPCC Systems Platform

HPCC Systems Platform is an open-source, enterprise-grade big data analytics computing platform that allows processing and analysis of massive data sets across parallel computing clusters. It provides a complete end-to-end data lake management solution with built-in ETL capabilities, high-performance distributed computing, and a declarative programming language called ECL.

Contributors

290

Organizations

9

Software value

$89M

Bigtop

Bigtop is an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components.

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?