Varada as a Connector

In addition to deploying Varada as a data platform‍, you can easily deploy Varada as a connector to leverage its performance improvements for your existing Trino clusters.

Varada can integrate into clusters running the open-source of Trino (formerly known as PrestoSQL), and the commercial offerings of Amazon EMR, GCP DataProc, and Starburst Enterprise. In this deployment model, Varada seamlessly applies its dynamic and adaptive indexing-based acceleration technology to your existing clusters.

Varada integrates into your existing clusters

The following diagram shows how Varada integrates into your existing clusters. Varada deploys its own connectors in addition to your existing connectors, and leaves your existing connectors in place. No query rewrites are required, as your workloads continue to query the existing cluster as before.

Deploying the Varada Connector

Deploying Varada as a connector is simple:

  1. copy to the connector tar.gz file to /tmp/ directory on each of the clusters' nodes.

  2. run the following script

# create the varada-install directory and extract the connector RPM
mkdir /tmp/varada-install
tar -zxf /tmp/presto-connector.tar.gz -C /tmp/varada-install

# add access permissions to the trino directory
sudo chmod 777 -R /opt/trino-362
  1. Install the connector by running the following script on each of the clusters' machines:
installer.py [-e VARADA_ROOT_FOLDER] [-o ORIGINAL_CATALOG] 
[-c TARGET_CATALOG_NAME]
[-w WORKER_TYPE] [-m METADATA_STORAGE_PATH] 
[-p PLUGIN_DIR_PATH] [-d CONFIG_DIR_PATH] [-b BACKEND_PORT_NUMBER] [-u CLUSTER-UNIQUE-ID]

where:

Parameter

Optional

Description

-e VARADA_ROOT_FOLDER

The path to the connector directory.

-o ORIGINAL_CATALOG

The catalog properties file from which to copy the properties.

-c TARGET_CATALOG_NAME

The name of the newly generated properties file for the Varada catalog.

-w WORKER_TYPE

Yes

The worker machine type. This parameter should only be specified if the coordinator machine type is different from the worker machine type.

-m METADATA_STORAGE_PATH

The path to the metadata persistency location on S3.

-p PLUGIN_DIR_PATH

The path to the plugin directory.

-d CONFIG_DIR_PATH

The path to the configuration directory.

-b BACKEND_PORT_NUMBER

The Varada port number.

Note: The coordinator node and the worker nodes need to be able to communicate via this port.

-u CLUSTER-UNIQUE-ID

A unique id to set for the cluster

For example:

sudo python3 /tmp/varada-install/varada-connector-362/varada/installer.py 
-e /tmp/varada-install/varada-connector-362/ 
-o hive 
-c varada 
-m s3://my-s3-bucket/user-2020-12-14-10-53-45/ 
-p /opt/trino/trino-server-362/plugin/ 
-d /opt/trino/trino-server-362/etc/ 
-b 8088 -u my-cluster
  1. Restart the cluster.

That's it! Once the cluster restarts, Varada's solution is integrated and is available through the Varada catalog, while the existing Hive or Iceberg catalog remains untouched. All queries running on tables under the Varada catalog will leverage Varada's acceleration.

🚧

The Varada Connector requires the coordinator node and the worker nodes to communicate via port 8088, which is the default http-rest-port defined in the varada.properties file.

🚧

The worker node instance type must be from the r5d or i3 families that include SSD disks.

Varada supports various Trino and Presto cluster deployment methods.

For specific information related to your deployment method, please contact [email protected].

Presto® is a trademark of the Linux Foundation.