LFX Platform

Know more about LFX Platform

LFX Insights

SQL Query Engines

Open-source distributed SQL query engines that enable interactive, high-performance analytic queries across heterogeneous data sources.

21 projects

21,725 contributors

$2.5B

The Presto Foundation Fund

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.

Contributors

5,381

Organizations

749

Software value

$2B

Trino

Trino is a distributed SQL query engine designed to query large data sets distributed across multiple heterogeneous data sources. It enables fast, interactive analytics across diverse data sources including Hadoop, object stores, relational databases, and other systems.

Contributors

5,117

Organizations

724

Software value

$68M

Apache DataFusion

Apache DataFusion is a fast, extensible query execution framework written in Rust that enables efficient processing of large-scale data using SQL. It provides a modular architecture for building high-performance data processing systems and analytics applications, with support for various data sources and formats.

Contributors

2,376

Organizations

557

Software value

$21M

Apache Pinot

Apache Pinot is a real-time distributed OLAP datastore designed to deliver scalable real-time analytics with low latency. It can ingest data from batch and streaming sources and provides a SQL interface for querying. The system is built to handle high throughput analytics workloads and supports rich indexing capabilities for optimized query performance.

Contributors

1,717

Organizations

230

Software value

$47M

Apache Hive

Apache Hive is a data warehouse software project built on top of Apache Hadoop that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It provides a mechanism to project structure onto data and query it using HQL (Hive Query Language), a SQL-like language.

Contributors

1,500

Organizations

149

Software value

$96M

Apache Cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.

Contributors

1,466

Organizations

79

Software value

$126M

ByConity

ByConity is a cloud-native data warehouse system that provides real-time analytics capabilities with high concurrency and low latency. It features separation of storage and compute, elastic scaling, and ACID transaction support.

Contributors

1,364

Organizations

120

Software value

$66M

Apache Kyuubi

Apache Kyuubi is a distributed multi-tenant service that provides high-performance SQL query capabilities and resource management for big data workloads. It offers a unified gateway for accessing data lakes through various engines like Apache Spark, enabling secure, scalable, and highly available data processing.

Contributors

972

Organizations

108

Software value

$8.2M

Apache Gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Contributors

667

Organizations

68

Software value

$22M

Snuba

Snuba is a time-series storage and analytics database service designed to power Sentry's event storage and analytics features. It provides a low-latency query service built on top of ClickHouse, optimized for high-volume event data and real-time querying.

Contributors

613

Organizations

116

Software value

$7.2M

Apache Phoenix

Apache Phoenix

Contributors

552

Organizations

56

Software value

$19M

Apache Doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

This project hasn't been onboarded to LFX Insights.

Calcite Avatica

Apache Calcite Avatica

This project hasn't been onboarded to LFX Insights.

Cube.js

๐Ÿ“Š Cube โ€” Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics

This project hasn't been onboarded to LFX Insights.

Databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

This project hasn't been onboarded to LFX Insights.

Deephaven Community Core

Deephaven Community Core

This project hasn't been onboarded to LFX Insights.

Drill

Apache Drill is a distributed MPP query layer for self describing data

This project hasn't been onboarded to LFX Insights.

Linkis

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

This project hasn't been onboarded to LFX Insights.

ProtonSQL

High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka, Pulsar, or ClickHouse, and seamlessly write results back. Supports powerful features like JOIN, CDC, UPSERT, and LOOKUP, enabling real-time analytics and ETL at scale.

This project hasn't been onboarded to LFX Insights.

Spark

Apache Spark - A unified analytics engine for large-scale data processing

This project hasn't been onboarded to LFX Insights.
Looking for a project thatโ€™s not listed?