LFX Platform

Know more about LFX Platform

LFX Insights

Data Query & Transformation Tools

Libraries, frameworks, and documentation for querying, transforming, and manipulating data across various sources. These tools enable users to extract, clean, reshape, and analyze data without requiring complex programming.

15 projects

31,468 contributors

$417M

Airbyte

Airbyte is an open-source data integration platform that helps users replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. It provides a large collection of pre-built connectors and allows users to build custom ones, enabling automated data synchronization and ETL workflows.

Contributors

8,550

Organizations

1,242

Software value

$109M

LLamaIndex

LlamaIndex is an open-source data framework for building LLM applications, providing tools to connect custom data sources to large language models, with features for data ingestion, structuring, retrieval, and natural language querying.

Contributors

8,409

Organizations

1,081

Software value

$61M

Geospatial Data Abstraction Library (GDAL)

GDAL (Geospatial Data Abstraction Library) is a translator library for raster and vector geospatial data formats that provides a unified abstract data model and API for accessing and manipulating geographic data. It supports reading, writing, and processing of a wide variety of geospatial file formats.

Contributors

3,108

Organizations

625

Software value

$87M

MapStruct

MapStruct is a Java annotation processor that automates the generation of type-safe bean mapping code, reducing the need to write manual object transformations between Java bean types. It generates readable and performant code for converting between different object models at compile time.

Contributors

2,608

Organizations

374

Software value

$4.6M

Apache SeaTunnel

Apache SeaTunnel is a distributed data integration platform that enables high-performance data synchronization between various data sources and destinations. It provides a unified pipeline for real-time and batch data transfer, supporting multiple data systems like databases, messaging systems, and file storage, with features for data transformation and processing.

Contributors

2,465

Organizations

170

Software value

$20M

OpenRefine

OpenRefine is a powerful open source tool for working with messy data, cleaning it, transforming it from one format into another, and extending it with web services and external data. It allows users to explore large data sets, fix inconsistencies, reconcile and match data to databases like Wikidata, and transform data into different formats for further use.

Contributors

1,639

Organizations

311

Software value

$22M

Pentaho Data Integration

Pentaho Data Integration (PDI), also known as Kettle, is an open source ETL (Extract, Transform, Load) tool that enables users to design and implement data integration workflows. It provides a graphical interface for creating data pipelines, transforming data between different formats and systems, and automating data movement processes.

Contributors

928

Organizations

49

Software value

$52M

Spring Data Commons

Spring Data Commons is a foundational library that provides shared infrastructure for Spring Data modules, offering core interfaces, annotations, and utilities for implementing data access patterns and object-relational mapping across different data stores. It standardizes basic CRUD operations, query derivation, and repository abstractions.

Contributors

835

Organizations

186

Software value

$2.8M

Node CSV

Node CSV is a comprehensive Node.js library for parsing, formatting, transforming and stringifying CSV data. It provides a collection of packages for working with CSV files, including modules for parsing CSV to arrays/objects, converting data to CSV format, and transforming CSV data streams.

Contributors

633

Organizations

219

Software value

$7.3M

Pentaho Platform

Pentaho Platform is an open-source business intelligence and data integration suite that provides data warehousing, reporting, analytics, data mining, and ETL (Extract, Transform, Load) capabilities. It offers a comprehensive platform for data-driven decision making, including tools for data visualization, dashboards, and enterprise reporting.

Contributors

624

Organizations

31

Software value

$19M

Keyv

Keyv is a simple key-value storage system with support for multiple backends, providing a consistent interface for caching and storing data across different storage adapters like Redis, MongoDB, MySQL, and others

Contributors

477

Organizations

174

Software value

$531K

Apache Jena

Apache Jena is a free and open-source Java framework for building Semantic Web and Linked Data applications. It provides APIs for reading, processing, writing, and querying RDF data, along with support for OWL and SPARQL. The framework includes tools for working with RDF graphs, ontologies, and reasoning engines.

Contributors

474

Organizations

120

Software value

$26M

eemeli/yaml

A JavaScript library for parsing and working with YAML data, providing a complete implementation of the YAML 1.2 specification with support for all common use cases and extensible APIs

Contributors

459

Organizations

207

Software value

$670K

Comunica

Comunica is a modular JavaScript framework for querying Linked Data on the Web. It provides a flexible architecture for building SPARQL query engines that can operate over various data sources and interfaces, supporting both local and remote data access.

Contributors

259

Organizations

94

Software value

$5M

Microsoft Power Query Documentation

Public repository for Microsoft Power Query documentation. All content in this repository is published to learn.microsoft.com.

This project hasn't been onboarded to LFX Insights.
Looking for a project that’s not listed?