LFX Platform

Know more about LFX Platform

LFX Insights

Distributed ML Systems

Platforms and frameworks for large-scale distributed machine learning that enable efficient execution of ML algorithms across computing clusters, optimizing for performance and scalability with big data.

16 projects

106,904 contributors

$1.2B

TensorFlow

TensorFlow is an open-source machine learning framework developed by Google that enables numerical computation and large-scale machine learning. It provides a flexible system for defining and executing computations involving tensors, which are multi-dimensional arrays. The framework supports deep learning and neural networks across multiple platforms and devices.

Contributors

47,170

Organizations

6,145

Software value

$198M

Kubeflow

Kubeflow is an open source machine learning platform built on Kubernetes that makes deploying and managing ML workflows on Kubernetes simple, portable and scalable. It provides end-to-end orchestration of machine learning pipelines, model training, serving, and experiment tracking.

Contributors

10,319

Organizations

2,276

Software value

$413M

PaddlePaddle

PaddlePaddle is an open-source deep learning platform developed by Baidu that provides a comprehensive suite of tools for AI model development, training, and deployment. It features an easy-to-use API, high performance distributed training capabilities, and extensive support for various deep learning applications including computer vision, natural language processing, and speech recognition.

Contributors

9,221

Organizations

478

Software value

$107M

Ray

Ray is an open-source unified framework for scaling AI and Python applications. It provides a simple, universal API for building distributed applications and includes libraries for machine learning, serving, streaming, and more. Ray enables developers to parallelize single-machine code with minimal code changes and scale applications from a laptop to a cluster.

Contributors

8,676

Organizations

1,474

Software value

$49M

PyTorch Lightning

PyTorch Lightning is a lightweight PyTorch wrapper that helps researchers and engineers train deep learning models with high performance at scale. It provides a high-level interface for organizing PyTorch code, automating complex training features like distributed training, mixed precision, and model checkpointing while removing boilerplate code.

Contributors

7,410

Organizations

1,478

Software value

$4.5M

XGBoost

XGBoost is a scalable, distributed gradient boosting library that provides parallel tree boosting for machine learning tasks. It implements machine learning algorithms under the gradient boosting framework, offering high performance, flexibility and portability across multiple programming languages and platforms.

Contributors

5,916

Organizations

829

Software value

$6.3M

Unsloth

Unsloth is an open-source project focused on optimizing and accelerating Large Language Models (LLMs) through efficient fine-tuning techniques. It provides tools and methods for faster LLM training and inference while reducing memory usage and computational requirements.

Contributors

3,683

Organizations

494

Software value

$1.4M

CatBoost

CatBoost is a high-performance, open-source gradient boosting library developed by Yandex that implements gradient boosting on decision trees. It provides fast, scalable, and accurate machine learning algorithms for classification, regression, and ranking tasks, with built-in support for categorical features.

Contributors

3,536

Organizations

343

Software value

$242M

LightGBM

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with faster training speed and higher efficiency, lower memory usage, better accuracy, parallel and GPU learning, and handling large-scale data.

Contributors

3,150

Organizations

480

Software value

$3.4M

Hugging Face Accelerate

Hugging Face Accelerate is a library that enables training and inference of machine learning models on multiple devices (CPU, GPU, TPU) with minimal code changes. It provides seamless distributed training capabilities, mixed precision support, and optimization features for PyTorch models.

Contributors

2,778

Organizations

506

Software value

$1.9M

Horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Contributors

2,264

Organizations

335

Software value

$2.8M

MindSpore

MindSpore is an open-source deep learning framework that provides a unified training and inference experience across different devices and platforms. It features automatic differentiation, dynamic debugging capabilities, and hardware optimization for AI model development.

Contributors

1,726

Organizations

71

Software value

$122M

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning.

Contributors

482

Organizations

55

Software value

$23M

CLAIMED

CLAIMED (Component Library for AI, Machine Learning, ETL and Data Science) is a runtime and programming language agnostic Data & AI component framework abstracting away all complexity for advanced MLOps and TrustedAI.

Contributors

335

Organizations

16

Software value

$4.6M

DLRover

DLRover is an autonomous distributed deep learning training system that provides elastic training, fault recovery, and performance optimization for large-scale deep learning models. It helps manage and scale training jobs across distributed infrastructure while handling failures and resource constraints.

Contributors

238

Organizations

34

Software value

$4.5M

Apache SystemDS

An open source ML system for the end-to-end data science lifecycle

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?