35 projects
Numba
Numba is a Just-In-Time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code, specializing in numerical computing and scientific applications. It enables Python functions to be compiled to native machine instructions, significantly improving performance for computationally intensive operations.
3,615
871
$9.9M
CatBoost
CatBoost is a high-performance, open-source gradient boosting library developed by Yandex that implements gradient boosting on decision trees. It provides fast, scalable, and accurate machine learning algorithms for classification, regression, and ranking tasks, with built-in support for categorical features.
3,537
343
$242M
FlashAttention
FlashAttention is a high-performance implementation of attention for training deep learning models, offering faster and more memory-efficient attention computation compared to standard implementations. It achieves this through a novel algorithm that reduces memory access and increases hardware utilization.
3,054
487
$3.2M
Triton
Triton is a programming language and compiler framework designed for GPU programming, focusing on tensor computations and machine learning workloads. It enables developers to write high-performance code for GPU acceleration with Python-like syntax while providing low-level hardware control.
2,866
519
$7.3M
CUTLASS
CUDA Templates for Linear Algebra Subroutines
1,341
219
$60M
NVIDIA DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
1,259
220
$14M
cuML
cuML - RAPIDS Machine Learning Library
1,092
166
$7M
CUDA.jl
CUDA.jl is a Julia programming language package that provides a comprehensive interface to NVIDIA's CUDA toolkit, enabling GPU computing capabilities within Julia. It allows developers to write high-performance GPU code using Julia's native syntax while abstracting away many low-level CUDA details.
918
233
$3.4M
SPIRV-Cross
SPIRV-Cross is a tool and library for performing reflection on SPIR-V and converting SPIR-V to other shader languages. It enables translation of SPIR-V shaders into GLSL, HLSL, MSL and other formats while preserving metadata and optimizations.
773
146
$10M
CUDA Core Compute Libraries (CCCL)
CUDA Core Compute Libraries
766
145
$27M
WarpX
The mission and charitable purposes (collectively, the “Charitable Purposes”) of the Project is to provide a community, performance-portable and modular Particle in-Cell code with advanced algorithms.
599
55
$12M
HOOMD-blue
Molecular dynamics and Monte Carlo soft matter simulation on GPUs.
497
37
$8.6M
QUDA
QUDA is a library for performing calculations in lattice QCD on GPUs.
252
28
$5.3M
Ginkgo
Numerical linear algebra software package
208
42
$9.7M
AMDGPU.jl
AMD GPU (ROCm) programming in Julia
Castro
Castro (Compressible Astrophysics): An adaptive mesh, astrophysical compressible (radiation-, magneto-) hydrodynamics simulation code for massively parallel CPU and GPU architectures.
CuPy
NumPy & SciPy for GPU
FlashInfer
FlashInfer: Kernel Library for LLM Serving
Futhark
:boom::computer::boom: A data-parallel functional programming language
GPUArrays
Reusable array functionality for Julia's various GPU backends.
HeAT
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
KernelAbstractions.jl
Heterogeneous programming in Julia
LinearSolve.jl
LinearSolve.jl: High-Performance Unified Interface for Linear Solvers in Julia. Easily switch between factorization and Krylov methods, add preconditioners, and all in one interface.
MIOpen
AMD's Machine Intelligence Library
Megatron-LM
Ongoing research training transformer models at scale
NCCL
Optimized primitives for collective multi-GPU communication
PIConGPU
Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:
RAPIDS AI
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Torch-TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
VortexGPGPU-Vortex
VortexGPGPU-Vortex is a Linux Foundation project focused on developing a high-performance, open-source GPU architecture implementation that leverages General-Purpose GPU (GPGPU) computing capabilities for accelerating parallel processing tasks across various computing environments.
cuDF
cuDF - GPU DataFrame Library
nvFuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
rocBLAS
Next generation BLAS implementation for ROCm platform