73 projects
Rendered at: 2025-07-16T06:06:38.648Z
LF AI & Data
LF AI & Data is a foundation under the Linux Foundation dedicated to advancing open-source artificial intelligence (AI), machine learning (ML), and data projects. It fosters collaboration between industry leaders, researchers, and developers to create scalable, trustworthy, and interoperable AI and data solutions.
78,088 contributors
$2.3B
vLLM
The mission of the Project is to develop an open-source library for fast LLM inference and serving.
15,228
1,878
$20M
ONNX
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
8,146
955
$46M
DeepSpeed
The mission of the Project is to develop a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
5,181
199
$9M
Milvus
As an open source vector similarity search engine, Milvus is easy-to-use, highly reliable, scalable, robust, and blazing fast. Adopted by over 100 organizations and institutions worldwide, Milvus empowers applications in a variety of fields, including image processing, computer vision, natural language processing, voice recognition, recommender systems, drug discovery, etc.
4,641
411
$39M
Delta Lake Project
Delta Lake is an open source storage layer that brings reliability to data lakes.
3,877
529
$38M
DELTA
Delta is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.
2,917
41
$2M
DeepRec
The mission of the Project is to develop a high-performance recommendation deep learning framework.
2,721
228
$149M
FATE Project
FATE is an open-source project initiated by Webank’s AI Department to provide a secure computing framework to support the federated AI ecosystem.
2,611
119
$36M
Horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
2,284
330
$2.8M
CLAIMED
CLAIMED (Component Library for AI, Machine Learning, ETL and Data Science) is a runtime and programming language agnostic Data & AI component framework abstracting away all complexity for advanced MLOps and TrustedAI.
2,068
23
$3.6M
Kserve
The mission of the Project is to develop a highly scalable and standards based model inference platform on Kubernetes for Trusted AI.
1,938
402
$61M
Flyte
Flyte is a container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang.
1,805
337
$69M
Kedro Project
The mission of the Project is to design and implement an open source framework for creating reproducible, maintainable and modular data science code.
1,710
194
$18M
Feast
Feast is the bridge between your data and your machine learning models allowing teams to register, ingest, serve, and monitor features in production.
1,618
346
$12M
TonY Project
The mission of the Project is to design and implement an open source framework to run distributed deep learning jobs reliably on computing infrastructures.
1,601
33
$1M
Angel
A Flexible and Powerful Parameter Server for large-scale machine learning.
1,483
60
$23M
JanusGraph
JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
1,477
215
$35M
Pyro
Deep universal probabilistic programming with Python and PyTorch.
1,388
238
$19M
sparklyr
R interface for Apache Spark.
1,304
131
$2.1M
Ludwig
Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system. Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks.
1,270
152
$8.7M
Amundsen
Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data. Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
1,076
255
$8.8M
IREE
IREE (Intermediate Representation Execution Environment1) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
1,011
107
$25M

Docling
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
878
122
$100M
Marquez
Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. Marquez was released and open sourced by WeWork.
813
56
$3M
Adversarial Robustness Toolbox
Adversarial Robustness Toolbox (ART) provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats.
717
61
$5.3M
Open Lineage
The mission of the Project is to enable the industry at-large to collect lineage metadata consistently and comprehensively across complex pipelines, creating a deeper understanding of data.
595
87
$11M
OPEA
The mission of the Project is to develop an ecosystem orchestration framework to efficiently integrate performant GenAI technologies and workflows leading to quicker GenAI adoption and business value.
594
56
$221M
Egeria
Egeria provides the Apache 2.0 licensed open metadata and governance type system, frameworks, APIs, event payloads and interchange protocols to enable tools, engines and platforms to exchange metadata in order to get the best value from data whilst ensuring it is properly governed.
588
54
$55M
Recommenders
The mission of the Project is to develop examples and best practices for building recommendation systems, provided as Jupyter notebooks.
576
112
$2.9M
RWKV
The mission of the Project is to develop a recurrent neural net language model with GPT-level LLM performance, which can also be directly trained like a GPT transformer.
563
35
$7.7M
Elyra
The mission of the Project is to create and maintain an open-source development workspace that simplifies the creation and orchestration of the AI model development lifecycle tasks.
478
92
$23M
Monocle
The mission of the Project is to develop a domain specific tracing framework for monitoring code used to build Generative AI applications.
464
8
$764K
Neural Network (NN) Streamer
? Neural Network (NN) Streamer, Stream Processing Paradigm for Neural Network Apps/Devices.
437
46
$24M
OpenFL
The mission of the OpenFL projet is to build a flexible, secure, scalable and easily learnable Federated Learning tool for data scientists and data owners.
428
43
$2.5M
AI Fairness 360
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
387
44
$1.6M
FlagAI
The mission of the Project is to develop a fast, easy-to-use and extensible toolkit for large-scale AI modeling, with the goal of supporting training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality.
332
28
$3.3M
DocArray
The mission of the DocArray project is to develop a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh.
327
46
$1.6M
LF AI & Data
LF AI & Data is an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence (AI) and data. LF AI & Data was created to support open source AI and data, and to create a sustainable open source AI ecosystem that makes it easy to create AI and data products and services using open source technologies. We foster collaboration under a neutral environment with an open governance in support of the harmonization and acceleration of open source technical projects.
213
36
Kompute
The mission of the Project is to advance the GPU Acceleration ecosystem in scientific and industry applications through cross-vendor graphics card tooling, and further capabilities for GPGPU computing across advanced data processing use-cases.
201
27
$588K
Xtreme1
The mission of the Project is to build an accessible open-source data-centric MLOps infrastructure to connect people, models and data.
176
23
$7.4M
Feathr
The mission of the Project is to develop an enterprise-grade, high performance feature store.
173
29
$4.6M
Substra
The mission of the Project is to design and implement an open source framework for traceable ML orchestration on decentralized sensitive data.
172
19
$5.6M
Bee-AI
The mission of BeeAI is building an open-source ecosystem that empowers developers to discover, run, and compose AI agents from any framework. We’re creating the infrastructure to make agents truly interoperable, regardless of their underlying implementation.
159
16
$7.1M
Datashim
The mission of the Project is to design and implement an Open Source framework that provides seamless access to Data in Kubernetes environments.
144
48
$714K
AI Explainability 360
interpretability and explainability of data and machine learning models. AI Explainability 360 is an open source toolkit that can help users better understand the ways that machine learning models predict labels using a wide variety of techniques throughout the AI application lifecycle.
141
18
$2M

Data Prep Kit
Data Prep Kit accelerates unstructured data preparation for LLM app developers. Developers can use Data Prep Kit to cleanse, transform, and enrich use case-specific unstructured data to pre-train LLMs, fine-tune LLMs, instruct-tune LLMs, or build Retrieval Augmented Generation (RAG) applications for LLMs
137
8
$4.7M
Open Voice Network Interoperability Initiative
The Open Voice Network Interoperability Initiative is developing The “Message Envelope,” a universal, open API for voice/chatbot and language model interoperability, analogous to HTTP AND HTML.
129
32
$3.2M
Adlik
Adlik offers a end-to-end optimizing framework for deep learning models whose goal is to accelerate deep learning inference process both on cloud and embedded environments.
107
8
$2.6M
Vortex
Vortex is an extensible, state-of-the-art format for columnar data. It includes specifications & tools for manipulating possibly-compressed arrays in-memory, on-disk (file format), and over-the-wire (IPC format). Vortex is built around the latest research from the database community.
74
18
$4.8M
LakeSoul
The mission of the Project is to develop an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
72
5
$4.5M
Data Practices
DataPractices.org was pioneered by data.world as a “Manifesto for Data Practices” of four values and 12 principles that illustrate the most effective, ethical, and modern approach to data teamwork. As a member of the foundation, datapractices.org will expand to offer open courseware and establish a collaborative approach to defining and refining data best practices.
67
7
$1.7M
Bitol
Within the BITOL project, the primary objective is to tackle multiple challenges, such as data normalization, ensuring the relevance of documentation, establishing service-level expectations, simplifying data and tool integration, and promoting a data product-oriented approach. These efforts offer several advantages, including stimulating innovation and streamlining integration processes. BITOL is a sandbox-stage project of the LF AI & Data Foundation. Contributed by: AIDA User Group in September 2023
61
19
$749K
Elastic Deep Learning (EDL)
Elastic Deep Learning using PaddlePaddle and Kubernetes.
59
11
$692K
Machine Learning eXchange (MLX)
The mission of the Project is to design and implement an open source Data and AI Assets Catalog and Execution Engine that allows the uploading, registration, execution, and deployment of AI pipelines and pipeline components, models, datasets and notebooks.
59
7
$321K
SapientML
The mission of the Project is to help data scientists rapidly create and amend AI models.
57
9
$1.5M
OpenDS4All
OpenDS4All is a project created to accelerate the creation of data science curricula at academic institutions. While a great deal of online material is available for data science, including online courses, we recognize that the best way for many students to learn (and for many institutions to deliver) content is through a combination of lectures, recitation or flipped classroom activities, and hands-on assignments.
54
14
$3.7M
RosaeNLG Project
An open source natural generation library.
38
10
$5.5M
Open Model Initiative
The mission of the Project is to support open community development of openly licensed baseline AI models for image, video and audio generation that individuals and organizations can use and augment in their own solutions.
36
7
$814K
SOAJS
SOAJS provides a complete enterprise open source microservice management platform.
32
9
$17M
1chipML
The mission of the 1chipML open source project is to design and implement a library for basic numerical crunching and machine learning for microcontrollers offering a highly reliable open framework to use on limited and low-power hardware.
30
8
$225K