44 projects
Argo
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.
22,849
5,389
$138M
Apache Airflow
Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. It allows users to create data pipelines as directed acyclic graphs (DAGs) of tasks, enabling complex orchestration of batch processes and data processing workflows.
17,666
2,480
$50M
Kubeflow
Kubeflow is an open source machine learning platform built on Kubernetes that makes deploying and managing ML workflows on Kubernetes simple, portable and scalable. It provides end-to-end orchestration of machine learning pipelines, model training, serving, and experiment tracking.
10,290
2,271
$412M
Dagster
Dagster is an open-source data orchestration framework that lets you define, test, and orchestrate data pipelines using Python code. It provides tools for building, testing, and monitoring data workflows while emphasizing software engineering best practices like modularity, testability, and gradual typing.
3,900
699
$78M
Prefect
Prefect is a workflow orchestration platform that enables users to build, schedule, and monitor data pipelines and machine learning workflows. It provides a Python-based framework for creating resilient, distributed workflows with features like automatic retries, caching, and real-time monitoring.
3,767
744
$29M
Apache NiFi
Apache NiFi is an enterprise data flow management and automation platform that enables organizations to reliably process, route, transform and distribute data between diverse systems. It provides a web-based interface for designing, controlling and monitoring data flows, with features for data provenance, security, extensibility and real-time control.
1,863
221
$44M
Flyte
Flyte is a container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang.
1,850
350
$80M
Kedro Project
The mission of the Project is to design and implement an open source framework for creating reproducible, maintainable and modular data science code.
1,693
198
$52M
NIPYPE
NIPYPE is a Python-based neuroimaging data processing framework that provides a uniform interface to existing neuroimaging software and facilitates interaction between these packages within a single workflow. It enables reproducible, distributed analysis of neuroimaging data through workflows and interfaces to commonly used neuroimaging tools.
1,039
193
$7.2M
Windmill
Windmill is an open-source developer platform for building internal tools and workflows. It provides a low-code solution for creating backend scripts, APIs, and UIs with features like resource management, scheduling, and version control. The platform enables developers to write scripts in multiple languages and automate business processes.
893
235
$23M
nf-core/rnaseq
A bioinformatics pipeline for RNA sequencing analysis that performs quality control, alignment, quantification and extensive quality control on RNA sequencing data
652
105
$1.1M
Taipy
Turns Data and AI algorithms into production-ready web applications in no time.
592
67
$4.9M
Meltano
Meltano is an open source ELT (Extract, Load, Transform) platform that helps organizations integrate and manage their data pipelines. It provides a command-line interface and web UI for orchestrating data workflows, managing configurations, and connecting various data tools and services.
420
82
$3.5M
Global Workflow
A comprehensive workflow system for NOAA's global numerical weather prediction models, providing end-to-end support for model initialization, execution, post-processing, and product generation for operational forecasting
348
9
$3.1M
Tremor
Tremor is an early stage event processing system for unstructured data with rich support for structural pattern matching, filtering and transformation.
178
86
$13M
Pegasus WMS
Pegasus WMS is a workflow management system that automates the execution of complex computational workflows across distributed computing resources. It transforms abstract workflow descriptions into concrete execution plans, handles data management, job scheduling, and fault tolerance for scientific applications.
80
11
$24M
OpenFIDO
Open Framework for Integrated Data Operations (OpenFIDO) is a data and model processing framework funded by the California Energy Commissions (EPC 17-047).
34
13
$3.5M
Astronomer Dbt-Airflow Integration
Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
Astronomer Helm Charts
Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes
Brooklyn атты
experiment orchestration and data acquisition
CDAP
An open source framework for building data analytic applications.
Cumulus
Cumulus Framework + Cumulus API
DolphinScheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
GalaxyProject: Data Science for Everyone
Data intensive science for everyone.
Hop
Hop Orchestration Platform
Instill Core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Kestra
:zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
Linkis
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Mage-AI Data Pipeline Platform by Mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Metaflow
Open Source AI/ML Platform
Nextflow
A DSL for data-driven computational pipelines
QUACC
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
Single-cell RNA-seq Pipeline
Single-cell RNA-Seq pipeline for barcode-based protocols such as 10x, DropSeq or SmartSeq, offering a variety of aligners and empty-droplet detection
Snakemake
This is the development home of the workflow management system Snakemake. For general information, see
Spring Cloud Data Flow
A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
Texera
Collaborative Machine-Learning-Centric Data Analytics Using Workflows
VAST
Tenzir is the data pipeline engine for security teams.
ZenML
ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
ampliseq
Amplicon sequencing analysis workflow using DADA2 and QIIME2
warp
WDL Analysis Research Pipelines