Varada Query Acceleration Engine

Varada's Data Platform is the query acceleration engine dynamically defines instructions to accelerate different workloads.

When a query hits a column for which no acceleration instructions are defined, the query acceleration engine performs data and index materialization to warm up the data in the column. This default acceleration is designed to improve performance while you are setting up the acceleration instructions for the system, as well as help you get the best use of your cluster storage capacity. It is performed as long as space is available in your cluster storage.

The query acceleration engine also uses a machine learning-based feedback loop that monitors which datasets and columns are frequently used, as well as which datasets are required to meet the performance requirements of high-priority workloads. It then dynamically and automatically configures acceleration strategies to deliver optimal performance and price balance.

The query acceleration engine includes two main components: the Collector and the Accelerator.

The Collector

The Collector continuously collects and summarizes query execution metadata from the query engine. The summarized model is optimized for insight extraction, and stored in columnar ORC format in an admin-defined S3 bucket. It includes, for example, query usage, column popularity and selectivity levels, operators used on specific tables, and cross-table relations.

The Accelerator

The Accelerator creates actionable insights based on historical query and data usage patterns from the Collector output. These insights are continuously revised based on real-time usage and performance, and translated into two types of acceleration strategies: cache strategies and indexing strategies.

Cache strategies determine which data should be cached to improve performance, while indexing strategies are translated into acceleration instructions that define which data to index, and the most effective indexing technology to use.

Acceleration Strategies

Varada uses two types of acceleration strategies:

  • Cache Strategies: Based on the frequency of data usage and its business priority, Varada uses SSD columnar nanoblock caching to speed up data access.
  • Indexing Strategies: Varada uses different indexing technologies to speed up data searches, filters, and joins, according to the data type and level of selectivity.