14 projects
DataFrame Libraries
High-performance libraries providing in‐memory dataframes for efficient data manipulation, analysis, and processing. This collection covers projects that offer a full API for working with tabular data with vectorized operations and out‐of‐core capabilities.
15,289 contributors
$54M
Polars
Polars is a high-performance DataFrame library implemented in Rust, offering fast data manipulation and analysis capabilities with a Python API. It features a query optimizer, parallel execution, and efficient memory usage through Arrow columnar format.
5,563
1,125
$22M
Dask
Dask is a flexible parallel computing library for analytics that provides dynamic task scheduling optimized for computation and integrates with Python data science libraries like NumPy, Pandas and Scikit-learn. It enables parallel and distributed computing through intuitive APIs and scales Python code from multi-core machines to clusters.
3,564
900
$6.8M
xarray
Xarray is a Python library that introduces labeled arrays and datasets, extending NumPy's capabilities by adding coordinates, dimensions, and attributes to N-dimensional arrays. It enables working with multi-dimensional data by providing data structures and operations for labeled arrays, making it particularly useful for scientific computing and analysis of structured data like climate and weather data.
2,960
699
$6.5M
GeoPandas
Python tools for geographic data
1,732
396
$1.7M
sparklyr
R interface for Apache Spark.
1,297
139
$2.1M
Daft
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
173
22
$14M
AnnData
Annotated data.
DataFrames.jl
In-memory tabular data in Julia
Deephaven Community Core
Deephaven Community Core
DimensionalData.jl
Named dimensions and indexing for julia arrays and other data
Ibis
the portable Python dataframe library
cuDF
cuDF - GPU DataFrame Library
data.table
R's data.table package extends data.frame:
dplyr
dplyr: A grammar of data manipulation