LFX Platform

Know more about LFX Platform

LFX Insights

LF AI & Data

LF AI & Data is a foundation under the Linux Foundation dedicated to advancing open-source artificial intelligence (AI), machine learning (ML), and data projects. It fosters collaboration between industry leaders, researchers, and developers to create scalable, trustworthy, and interoperable AI and data solutions.

72 projects

53,383 contributors

$5.1B

ONNX

ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

Contributors

7,986

Organizations

996

Software value

$51M

Milvus

As an open source vector similarity search engine, Milvus is easy-to-use, highly reliable, scalable, robust, and blazing fast. Adopted by over 100 organizations and institutions worldwide, Milvus empowers applications in a variety of fields, including image processing, computer vision, natural language processing, voice recognition, recommender systems, drug discovery, etc.

Contributors

4,864

Organizations

442

Software value

$43M

Delta Lake Project

Delta Lake is an open source storage layer that brings reliability to data lakes.

Contributors

3,908

Organizations

571

Software value

$41M

1chipML

1chipML is an open-source project focused on developing machine learning solutions optimized for microcontrollers and resource-constrained devices, enabling efficient AI/ML deployment on embedded systems

Contributors

3,740

Organizations

523

Software value

$225K

DeepRec

The mission of the Project is to develop a high-performance recommendation deep learning framework.

Contributors

2,723

Organizations

250

Software value

$160M

docling

A project focused on developing tools and resources for documenting and analyzing languages, particularly endangered and under-resourced languages, through computational and linguistic approaches

Contributors

2,311

Organizations

344

Software value

$635M

Horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Contributors

2,266

Organizations

334

Software value

$2.8M

Kserve

The mission of the Project is to develop a highly scalable and standards based model inference platform on Kubernetes for Trusted AI.

Contributors

2,056

Organizations

423

Software value

$216M

Flyte

Flyte is a container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang.

Contributors

1,839

Organizations

349

Software value

$80M

Kedro Project

The mission of the Project is to design and implement an open source framework for creating reproducible, maintainable and modular data science code.

Contributors

1,687

Organizations

198

Software value

$52M

FATE Project

FATE is an open-source project initiated by Webank’s AI Department to provide a secure computing framework to support the federated AI ecosystem.

Contributors

1,643

Organizations

109

Software value

$103M

Feast

Feast is the bridge between your data and your machine learning models allowing teams to register, ingest, serve, and monitor features in production.

Contributors

1,400

Organizations

344

Software value

$13M

JanusGraph

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

Contributors

1,380

Organizations

209

Software value

$35M

sparklyr

R interface for Apache Spark.

Contributors

1,300

Organizations

141

Software value

$2.1M

Pyro

Deep universal probabilistic programming with Python and PyTorch.

Contributors

1,215

Organizations

246

Software value

$24M

IREE

IREE (Intermediate Representation Execution Environment1) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

Contributors

1,116

Organizations

129

Software value

$27M

Amundsen

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data. Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Contributors

1,035

Organizations

255

Software value

$8.8M

Ludwig

Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system. Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks.

Contributors

942

Organizations

154

Software value

$14M

Adversarial Robustness Toolbox

Adversarial Robustness Toolbox (ART) provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats.

Contributors

708

Organizations

65

Software value

$6.9M

Open Lineage

The mission of the Project is to enable the industry at-large to collect lineage metadata consistently and comprehensively across complex pipelines, creating a deeper understanding of data.

Contributors

653

Organizations

101

Software value

$12M

OPEA

The mission of the Project is to develop an ecosystem orchestration framework to efficiently integrate performant GenAI technologies and workflows leading to quicker GenAI adoption and business value.

Contributors

652

Organizations

59

Software value

$249M

Recommenders

The mission of the Project is to develop examples and best practices for building recommendation systems, provided as Jupyter notebooks.

Contributors

514

Organizations

111

Software value

$3M

Marquez

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more. Marquez was released and open sourced by WeWork.

Contributors

493

Organizations

65

Software value

$3M

Elyra

The mission of the Project is to create and maintain an open-source development workspace that simplifies the creation and orchestration of the AI model development lifecycle tasks.

Contributors

488

Organizations

92

Software value

$23M

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning.

Contributors

481

Organizations

57

Software value

$23M

Egeria

Egeria provides the Apache 2.0 licensed open metadata and governance type system, frameworks, APIs, event payloads and interchange protocols to enable tools, engines and platforms to exchange metadata in order to get the best value from data whilst ensuring it is properly governed.

Contributors

460

Organizations

52

Software value

$47M

Neural Network (NN) Streamer

? Neural Network (NN) Streamer, Stream Processing Paradigm for Neural Network Apps/Devices.

Contributors

448

Organizations

51

Software value

$34M

AI Fairness 360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

Contributors

388

Organizations

47

Software value

$2.9M

FlagAI

The mission of the Project is to develop a fast, easy-to-use and extensible toolkit for large-scale AI modeling, with the goal of supporting training, fine-tuning, and deployment of large-scale models on various downstream tasks with multi-modality.

Contributors

360

Organizations

34

Software value

$47M

CLAIMED

CLAIMED (Component Library for AI, Machine Learning, ETL and Data Science) is a runtime and programming language agnostic Data & AI component framework abstracting away all complexity for advanced MLOps and TrustedAI.

Contributors

334

Organizations

16

Software value

$4.6M

DocArray

The mission of the DocArray project is to develop a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh.

Contributors

308

Organizations

47

Software value

$439M

OpenFL

The mission of the OpenFL projet is to build a flexible, secure, scalable and easily learnable Federated Learning tool for data scientists and data owners.

Contributors

308

Organizations

37

Software value

$5M

RWKV

The mission of the Project is to develop a recurrent neural net language model with GPT-level LLM performance, which can also be directly trained like a GPT transformer.

Contributors

283

Organizations

43

Software value

$71M

AGNTCY

A software development agency or organization called AGNTCY

Contributors

238

Organizations

34

Software value

$29M

DLRover

DLRover is an autonomous distributed deep learning training system that provides elastic training, fault recovery, and performance optimization for large-scale deep learning models. It helps manage and scale training jobs across distributed infrastructure while handling failures and resource constraints.

Contributors

236

Organizations

35

Software value

$4.5M

Bee-AI

The mission of BeeAI is building an open-source ecosystem that empowers developers to discover, run, and compose AI agents from any framework. We’re creating the infrastructure to make agents truly interoperable, regardless of their underlying implementation.

Contributors

235

Organizations

32

Software value

$9.7M

LF AI & Data

LF AI & Data is an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence (AI) and data. LF AI & Data was created to support open source AI and data, and to create a sustainable open source AI ecosystem that makes it easy to create AI and data products and services using open source technologies. We foster collaboration under a neutral environment with an open governance in support of the harmonization and acceleration of open source technical projects.

Contributors

205

Organizations

40

Substra

The mission of the Project is to design and implement an open source framework for traceable ML orchestration on decentralized sensitive data.

Contributors

169

Organizations

20

Software value

$5.6M

Feathr

The mission of the Project is to develop an enterprise-grade, high performance feature store.

Contributors

167

Organizations

25

Software value

$6.2M

Kompute

The mission of the Project is to advance the GPU Acceleration ecosystem in scientific and industry applications through cross-vendor graphics card tooling, and further capabilities for GPGPU computing across advanced data processing use-cases.

Contributors

164

Organizations

30

Software value

$588K

DELTA

Delta is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.

Contributors

160

Organizations

19

Software value

$2M

Datashim

The mission of the Project is to design and implement an Open Source framework that provides seamless access to Data in Kubernetes environments.

Contributors

145

Organizations

49

Software value

$714K

AI Explainability 360

interpretability and explainability of data and machine learning models. AI Explainability 360 is an open source toolkit that can help users better understand the ways that machine learning models predict labels using a wide variety of techniques throughout the AI application lifecycle.

Contributors

139

Organizations

20

Software value

$2.4M

Data Prep Kit

Data Prep Kit accelerates unstructured data preparation for LLM app developers. Developers can use Data Prep Kit to cleanse, transform, and enrich use case-specific unstructured data to pre-train LLMs, fine-tune LLMs, instruct-tune LLMs, or build Retrieval Augmented Generation (RAG) applications for LLMs

Contributors

138

Organizations

9

Software value

$13M

Open Voice Network Interoperability Initiative

The Open Voice Network Interoperability Initiative is developing The “Message Envelope,” a universal, open API for voice/chatbot and language model interoperability, analogous to HTTP AND HTML.

Contributors

132

Organizations

32

Software value

$4.2M

Vortex

Vortex is an extensible, state-of-the-art format for columnar data. It includes specifications & tools for manipulating possibly-compressed arrays in-memory, on-disk (file format), and over-the-wire (IPC format). Vortex is built around the latest research from the database community.

Contributors

121

Organizations

34

Software value

$8M

Adlik

Adlik offers a end-to-end optimizing framework for deep learning models whose goal is to accelerate deep learning inference process both on cloud and embedded environments.

Contributors

106

Organizations

7

Software value

$2.7M

Bitol

Within the BITOL project, the primary objective is to tackle multiple challenges, such as data normalization, ensuring the relevance of documentation, establishing service-level expectations, simplifying data and tool integration, and promoting a data product-oriented approach. These efforts offer several advantages, including stimulating innovation and streamlining integration processes. BITOL is a sandbox-stage project of the LF AI & Data Foundation. Contributed by: AIDA User Group in September 2023

Contributors

78

Organizations

22

Software value

$980K

LakeSoul

The mission of the Project is to develop an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Contributors

71

Organizations

5

Software value

$6.1M

Monocle

The mission of the Project is to develop a domain specific tracing framework for monitoring code used to build Generative AI applications.

Contributors

67

Organizations

5

Software value

$2M

Elastic Deep Learning (EDL)

Elastic Deep Learning using PaddlePaddle and Kubernetes.

Contributors

56

Organizations

10

Software value

$692K

Machine Learning eXchange (MLX)

The mission of the Project is to design and implement an open source Data and AI Assets Catalog and Execution Engine that allows the uploading, registration, execution, and deployment of AI pipelines and pipeline components, models, datasets and notebooks.

Contributors

55

Organizations

7

Software value

$376K

OpenDS4All

OpenDS4All is a project created to accelerate the creation of data science curricula at academic institutions. While a great deal of online material is available for data science, including online courses, we recognize that the best way for many students to learn (and for many institutions to deliver) content is through a combination of lectures, recitation or flipped classroom activities, and hands-on assignments.

Contributors

53

Organizations

13

Software value

$8.5M

SapientML

The mission of the Project is to help data scientists rapidly create and amend AI models.

Contributors

51

Organizations

9

Software value

$1.5M

Open Model Initiative

The mission of the Project is to support open community development of openly licensed baseline AI models for image, video and audio generation that individuals and organizations can use and augment in their own solutions.

Contributors

45

Organizations

8

Software value

$2.9M

RosaeNLG Project

An open source natural generation library.

Contributors

38

Organizations

10

Software value

$89M

SOAJS

SOAJS provides a complete enterprise open source microservice management platform.

Contributors

31

Organizations

13

Software value

$38M

OpenDataology

The mission of the OpenDataology project is to provide a crowd-sourced platform that provides approaches to analyze and document the license compliance risks of publicly available datasets used for Artificial Intelligence (AI) software. In addition, the project endeavors to develop and promote open standards that capture the metadata required for performing license compliance analysis, dataset license compliance analysis processes and supporting tools.

Contributors

27

Organizations

7

Software value

$513K

TonY Project

The mission of the Project is to design and implement an open source framework to run distributed deep learning jobs reliably on computing infrastructures.

Contributors

24

Organizations

1

OpenBytes

The mission of the Project is to facilitate wider sharing of, and collaboration with, data in the AI community through the creation of data standards and formats and enabling contributions of data.

Contributors

20

Organizations

4

Software value

$305K

Looking for a project that’s not listed?