LFX Platform

Know more about LFX Platform

LFX Insights

Extract-Transform-Load (ETL) Tools

Software for extracting, transforming, and loading data between systems.

25 projects

43,311 contributors

$637M

Apache Spark

Apache Spark - A unified analytics engine for large-scale data processing

Contributors

9,131

Organizations

1,283

Software value

$82M

Airbyte

Airbyte is an open-source data integration platform that helps users replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. It provides a large collection of pre-built connectors and allows users to build custom ones, enabling automated data synchronization and ETL workflows.

Contributors

8,620

Organizations

1,251

Software value

$113M

Logstash

Logstash is a server-side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a destination of choice. It is commonly used to collect logs and other time-series data for search, analysis and visualization in Elasticsearch.

Contributors

5,696

Organizations

1,297

Software value

$5.4M

dbt Core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Contributors

4,004

Organizations

614

Software value

$8.6M

Dagster

Dagster is an open-source data orchestration framework that lets you define, test, and orchestrate data pipelines using Python code. It provides tools for building, testing, and monitoring data workflows while emphasizing software engineering best practices like modularity, testability, and gradual typing.

Contributors

3,962

Organizations

711

Software value

$79M

Apache SeaTunnel

Apache SeaTunnel is a distributed data integration platform that enables high-performance data synchronization between various data sources and destinations. It provides a unified pipeline for real-time and batch data transfer, supporting multiple data systems like databases, messaging systems, and file storage, with features for data transformation and processing.

Contributors

2,521

Organizations

173

Software value

$20M

Apache NiFi

Apache NiFi is an enterprise data flow management and automation platform that enables organizations to reliably process, route, transform and distribute data between diverse systems. It provides a web-based interface for designing, controlling and monitoring data flows, with features for data provenance, security, extensibility and real-time control.

Contributors

1,876

Organizations

229

Software value

$44M

Debezium

Debezium is an open source distributed platform for change data capture (CDC). It captures row-level changes in databases like MySQL, PostgreSQL, MongoDB, and others, and streams them to applications in real-time. This enables event-driven architectures, data replication, and microservices integration.

Contributors

1,656

Organizations

270

Software value

$14M

Apache DevLake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

Contributors

1,046

Organizations

200

Software value

$9.4M

Mage AI

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Contributors

978

Organizations

132

Software value

$24M

Pentaho Data Integration

Pentaho Data Integration (PDI), also known as Kettle, is an open source ETL (Extract, Transform, Load) tool that enables users to design and implement data integration workflows. It provides a graphical interface for creating data pipelines, transforming data between different formats and systems, and automating data movement processes.

Contributors

913

Organizations

51

Software value

$52M

Apache Hop

Hop Orchestration Platform

Contributors

720

Organizations

53

Software value

$44M

CDAP

An open source framework for building data analytic applications.

Contributors

660

Organizations

33

Software value

$24M

Meltano

Meltano is an open source ELT (Extract, Load, Transform) platform that helps organizations integrate and manage their data pipelines. It provides a command-line interface and web UI for orchestrating data workflows, managing configurations, and connecting various data tools and services.

Contributors

419

Organizations

83

Software value

$3.5M

BigQuery ETL

Bigquery ETL

Contributors

366

Organizations

31

Software value

$19M

Cumulus Framework

Cumulus Framework + Cumulus API

Contributors

252

Organizations

33

Software value

$17M

Tapdata

Tapdata Live Data Platform Project

Contributors

151

Organizations

9

Software value

$18M

Instill Core

Instill Core is an open-source MLOps platform that provides infrastructure for building and deploying AI applications. It enables integration of various AI models and data sources through a unified API and pipeline system.

Contributors

138

Organizations

26

Software value

$2.1M

Tenzir

Tenzir is a high-performance data processing engine that enables real-time analysis and transformation of large-scale network and security data. It provides a unified platform for collecting, enriching, and analyzing diverse data sources with a focus on network security and observability.

Contributors

134

Organizations

35

Software value

$7.7M

Stroom

Stroom is a highly scalable data storage, processing and analysis platform.

Contributors

68

Organizations

8

Software value

$50M

Cloud Dataflow Templates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks

This project hasn't been onboarded to LFX Insights.

FIWARE Cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context

This project hasn't been onboarded to LFX Insights.

stream-reactor

A collection of open source Apache 2.0 Kafka Connector maintained by Lenses.io.

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?