LFX Platform

Know more about LFX Platform

LFX Insights

Stream Processing Frameworks

Frameworks for real-time processing of data streams.

17 projects

30,706 contributors

$506M

Apache Beam

Apache Beam is a unified programming model and framework for building and executing batch and streaming data processing pipelines. It provides a portable API that enables developers to write data processing code once and run it on various execution engines like Apache Spark, Apache Flink, and Google Cloud Dataflow.

Contributors

4,734

Organizations

622

Software value

$94M

Apache Flink

Apache Flink is a distributed stream processing and batch computation framework. It provides high-throughput, low-latency data streaming engines as well as support for event-driven applications and batch processing. The framework enables stateful computations over data streams and features automatic memory management, fault tolerance, and exactly-once processing semantics.

Contributors

4,645

Organizations

557

Software value

$98M

Apache Kafka

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant handling of real-time data feeds. It enables building real-time streaming data pipelines and applications that can process, transform, and react to streams of events.

Contributors

4,490

Organizations

738

Software value

$44M

TDengine

TDengine is an open-source time-series database management system designed for Internet of Things (IoT), Industrial IoT, and Connected Cars scenarios. It features high-performance data ingestion, storage, and querying capabilities, with built-in caching, stream processing, and data subscription functions.

Contributors

3,891

Organizations

249

Software value

$71M

Apache Hudi

Apache Hudi is a data lake platform that provides streaming data ingestion and bulk data management capabilities. It enables atomic updates, record-level change streams, and incremental data processing on large analytical datasets stored in data lakes. The platform supports ACID transactions, efficient upserts, and real-time analytics while maintaining data quality and consistency.

Contributors

3,033

Organizations

270

Software value

$23M

Hazelcast

Hazelcast is an open-source distributed computing platform that provides in-memory data storage and processing capabilities. It offers features like distributed caching, distributed data structures, distributed computing, and clustering for building scalable applications.

Contributors

2,976

Organizations

464

Software value

$63M

Apache SeaTunnel

Apache SeaTunnel is a distributed data integration platform that enables high-performance data synchronization between various data sources and destinations. It provides a unified pipeline for real-time and batch data transfer, supporting multiple data systems like databases, messaging systems, and file storage, with features for data transformation and processing.

Contributors

2,420

Organizations

169

Software value

$20M

Apache NiFi

Apache NiFi is an enterprise data flow management and automation platform that enables organizations to reliably process, route, transform and distribute data between diverse systems. It provides a web-based interface for designing, controlling and monitoring data flows, with features for data provenance, security, extensibility and real-time control.

Contributors

1,863

Organizations

221

Software value

$44M

Apache Pinot

Apache Pinot is a real-time distributed OLAP datastore designed to deliver scalable real-time analytics with low latency. It can ingest data from batch and streaming sources and provides a SQL interface for querying. The system is built to handle high throughput analytics workloads and supports rich indexing capabilities for optimized query performance.

Contributors

1,717

Organizations

230

Software value

$47M

FS2: Functional Streams for Scala

FS2 is a streaming library for Scala that provides pure functional, effectful, and compositional streaming abstractions. It enables processing of infinite streams of data in a memory-efficient way with support for concurrent and resource-safe operations.

Contributors

937

Organizations

245

Software value

$1.6M

Cloud Dataflow Templates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks

This project hasn't been onboarded to LFX Insights.

Hop

Hop Orchestration Platform

This project hasn't been onboarded to LFX Insights.

ProtonSQL

High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka, Pulsar, or ClickHouse, and seamlessly write results back. Supports powerful features like JOIN, CDC, UPSERT, and LOOKUP, enabling real-time analytics and ETL at scale.

This project hasn't been onboarded to LFX Insights.

Redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!

This project hasn't been onboarded to LFX Insights.

Spark

Apache Spark - A unified analytics engine for large-scale data processing

This project hasn't been onboarded to LFX Insights.

Storm

Apache Storm

This project hasn't been onboarded to LFX Insights.

StreamPipes

Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?