Index and Cache Resiliency

Varada provides index and cache resiliency that makes index and data cache creation X5-X10 time faster. When a new index is created or data is cached, it is stored on the cluster SSD NVMe drives, as well as in a dedicated, shared bucket in your data lake. When you scale in clusters, the indexes and data cache remain available in the shared bucket.

This enables fast warm-up when scaling back out, adding new clusters, or adding worker nodes to a cluster, and eliminates the need for keeping instances idling when a cluster is not in use. Instead, when a cluster needs to warm index or data, it first accesses the designated bucket on the data lake to check if the indexes or cache are ready, and if so, loads them.

If the indexes or cache are not available in the shared storage or cannot be loaded for any reason, the data is warmed as usual.

Varada includes three data and index layers:

  • Hot Data and Index: Varada uses SSD NVMe attached nodes in your VPC to process queries and store hot data and a cache for optimal performance. This layer is enabled by default.

  • Warm Data and Index: Indexes and cached data are also stored in a designated object storage bucket on your data lake. This layer is shared among all your clusters to ensure minimal resources are allocated to indexing when scaling out or adding new clusters. When a cluster is scaled in or eliminated and some (or all) nodes are shut down, the indexes remain available as warm data.

  • Cold Data: Your data lake remains the single source of truth.

With this approach, you can continue using your existing scaling and auto-scaling policies, scaling groups, and tools. Any new use case that is added, even if it’s set up on a separate cluster, will benefit from any indexes that were already created.

👍

You can define retention policies to limit the amount of data stored in S3.

📘

Index and cache resiliency is disabled by default. For details about enabling this feature, see Enable Index and Cache Resiliency.


What’s Next