21 projects
llama cpp
A port of Facebook's LLaMA model in C/C++, focusing on efficient inference and deployment of large language models on consumer hardware. The project provides optimized implementations for running LLaMA models with minimal dependencies and memory requirements.
11,890
1,799
$29M
YOLOv5
YOLOv5 is a computer vision model and framework for real-time object detection, offering state-of-the-art performance, easy training and deployment capabilities, and extensive documentation. It implements the YOLO (You Only Look Once) architecture with improvements for speed and accuracy.
8,826
705
$804K
ONNX
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
8,035
995
$51M
OpenVINO
OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit that enables fast development of computer vision and deep learning applications, optimizing neural network models for Intel hardware. It provides tools for model optimization, inference engine deployment, and cross-platform performance acceleration.
4,340
389
$83M
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
3,513
310
$28M
Triton Inference Server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
3,341
504
$5.3M
Apache TVM
Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.
3,270
388
$24M
MLC LLM
Universal LLM Deployment Engine with ML Compilation
1,709
237
$2.9M
XLA
XLA (Accelerated Linear Algebra) is a compiler and runtime system for machine learning that optimizes and executes computational graphs across different hardware platforms. It provides hardware abstraction and optimization capabilities for deep learning frameworks, enabling efficient execution of ML models on various accelerators like GPUs and TPUs.
1,698
203
$56M
ExecuTorch
ExecuTorch is a runtime solution for deploying PyTorch models on mobile and edge devices, focusing on efficient model execution and optimization for resource-constrained environments. It provides tools for model compilation, portability across different hardware platforms, and streamlined deployment of AI models.
1,532
160
$30M
GGML
GGML is a tensor library for machine learning that enables efficient neural network inference on CPU. It provides low-level primitives for implementing deep learning models with a focus on performance and memory efficiency, particularly for running large language models on consumer hardware.
1,531
274
$7.9M
TT-Metal
TT-Metal is a software framework for programming Tenstorrent AI accelerator chips, providing low-level hardware access and control capabilities for machine learning workloads
1,451
79
$112M
MLX
MLX: An array framework for Apple silicon
1,050
249
$5.6M
Neural Network (NN) Streamer
? Neural Network (NN) Streamer, Stream Processing Paradigm for Neural Network Apps/Devices.
458
49
$34M
XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
424
93
$105M
Optimum Intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
304
27
$1.8M
ShaderNN
The mission of the Project is to develop a lightweight deep learning inference framework optimized for Convolutional Neural Networks.
14
3
$35M
InternLM
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/README.md)
Neuro-One
On-device Neural Engine
Torch-TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT