Varada’s big data indexing solution serves as a smart acceleration layer on your data lake. Varada dramatically accelerates your queries using a combination of indexing technologies to speed up queries with aggregation, filters, and joins, together with smart cache management to speed up data access and improve performance.
Varada leverages the Trino open-source distributed SQL query engine, enabling you to run any SQL query without the need for modeling or optimizations. Varada also includes out-of-the-box native support for all community-supported Trino SQL connectors to access a wide array of data sources.
You can deploy Varada either as a data platform on one or more dedicated clusters in your VPC, as described in Varada as a Data Platform, or as a connector to your existing Trino clusters, as described in Varada as a Connector. In both scenarios, the Varada Control Center connects directly to your data lake, enabling it to efficiently synchronize changes and rapidly index and cache new data as it lands.
The Varada Control Center is a Web-based user interface for cluster management and monitoring. The Varada Control Center connects to your cluster, and continuously monitors your workloads. When you run a query from one of the supported clients, the integrated Query Acceleration Engine analyzes and detects which datasets to accelerate, and applies the optimal acceleration strategy. Learn more about this process in Query Acceleration with Varada.
From the Varada Control Center, you can add your own acceleration instructions, and monitor your cluster activity so that you have the insights required to proactively and effectively allocate resources.
Varada dramatically accelerates your queries using a combination of indexing technologies and smart cache management.
Indexing technologies, including Basic Index, Lucene), and Bloom Filter Index, help speed up queries with aggregation, filters, and joins.
Varada supports all of Trinos’s built-in data types, including structural data types. While all structural data types (ARRAY, MAP, and ROW) are accessible with Varada, only fields inside ROW data types can be indexed (starting from version 360.11).
In parallel, Varada caches frequently accessed (hot) data to speed up data access and improve performance.
Varada distinguishes between data layers: hot, warm, and cold.
Hot: Varada clusters leverage NVMe SSDs to process queries, and store indexes and a hot data cache for optimal performance.
Warm: You can optionally also store indexes from your clusters in shared storage in the warm layer. This enables fast warmup when scaling out, adding new clusters, or adding worker nodes to a cluster, as described in Index and Cache Resiliency.
Cold: Your data lake remains the single source of truth.
Updated 5 months ago