17 projects
Apache Beam
Apache Beam is a unified programming model and framework for building and executing batch and streaming data processing pipelines. It provides a portable API that enables developers to write data processing code once and run it on various execution engines like Apache Spark, Apache Flink, and Google Cloud Dataflow.
4,734
622
$94M
Apache Flink
Apache Flink is a distributed stream processing and batch computation framework. It provides high-throughput, low-latency data streaming engines as well as support for event-driven applications and batch processing. The framework enables stateful computations over data streams and features automatic memory management, fault tolerance, and exactly-once processing semantics.
4,645
557
$98M
Apache Kafka
Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant handling of real-time data feeds. It enables building real-time streaming data pipelines and applications that can process, transform, and react to streams of events.
4,490
738
$44M
TDengine
TDengine is an open-source time-series database management system designed for Internet of Things (IoT), Industrial IoT, and Connected Cars scenarios. It features high-performance data ingestion, storage, and querying capabilities, with built-in caching, stream processing, and data subscription functions.
3,891
249
$71M
Apache Hudi
Apache Hudi is a data lake platform that provides streaming data ingestion and bulk data management capabilities. It enables atomic updates, record-level change streams, and incremental data processing on large analytical datasets stored in data lakes. The platform supports ACID transactions, efficient upserts, and real-time analytics while maintaining data quality and consistency.
3,033
270
$23M
Hazelcast
Hazelcast is an open-source distributed computing platform that provides in-memory data storage and processing capabilities. It offers features like distributed caching, distributed data structures, distributed computing, and clustering for building scalable applications.
2,976
464
$63M
Apache SeaTunnel
Apache SeaTunnel is a distributed data integration platform that enables high-performance data synchronization between various data sources and destinations. It provides a unified pipeline for real-time and batch data transfer, supporting multiple data systems like databases, messaging systems, and file storage, with features for data transformation and processing.
2,420
169
$20M
Apache NiFi
Apache NiFi is an enterprise data flow management and automation platform that enables organizations to reliably process, route, transform and distribute data between diverse systems. It provides a web-based interface for designing, controlling and monitoring data flows, with features for data provenance, security, extensibility and real-time control.
1,863
221
$44M
Apache Pinot
Apache Pinot is a real-time distributed OLAP datastore designed to deliver scalable real-time analytics with low latency. It can ingest data from batch and streaming sources and provides a SQL interface for querying. The system is built to handle high throughput analytics workloads and supports rich indexing capabilities for optimized query performance.
1,717
230
$47M
FS2: Functional Streams for Scala
FS2 is a streaming library for Scala that provides pure functional, effectful, and compositional streaming abstractions. It enables processing of infinite streams of data in a memory-efficient way with support for concurrent and resource-safe operations.
937
245
$1.6M
Cloud Dataflow Templates
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
Hop
Hop Orchestration Platform
ProtonSQL
High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka, Pulsar, or ClickHouse, and seamlessly write results back. Supports powerful features like JOIN, CDC, UPSERT, and LOOKUP, enabling real-time analytics and ETL at scale.
Redpanda
Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Storm
Apache Storm
StreamPipes
Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.