Deploying the Varada Connector

Varada Connector Deployment Process Overview

Deploying Varada as a connector includes the following main steps:

  1. Copy the connector tar.gz file to each of the nodes in the cluster.

  2. Run a script on each node to configure the setup.

  3. Run a script on each node to install the connector.

👍

If you'd prefer, you can first run the script in dry run mode to view the operations that will be performed as part of the installation without making any changes to your system.

  1. Restart the cluster.

Requirements

  • The connector can be deployed on machines from the i3, r5d, or the r5dn family with SSD disks attached.
  • An S3 bucket is required for metadata.
  • The coordinator node and the worker nodes must be able to communicate via port 8088, which is the default http-rest-port defined in the varada.properties file.
  • Port 8088 must be open to the IP addresses from which API calls are made.
  • Varada utilizes the cluster SSDs. To avoid potential data corruption, ensure that your cluster does not write to disk.

📘

The community edition of the connector is available for Trino 362, Trino 364, Trino 367, Trino 370, Trino 374, Trino 375 and is free for clusters with up to 4 workers of i3.4xlarge, r5d.4xlarge, or r5dn.4xlarge.

Deploy the Varada Connector

Deploying Varada as a connector is simple. On each of the cluster nodes, do the following:

👍

Instead of running these commands manually, you can save time by adding the commands to a user data script you pass to the machines.

  1. Copy the connector varada-trino-connector.tar.gz file to the /tmp/ directory.
wget <connector_link> -O /tmp/varada-trino-connector.tar.gz
  1. Run the following script to configure the setup:
# create the varada-install directory and unpack the tarball
mkdir /tmp/varada-install
tar -zxf /tmp/varada-trino-connector.tar.gz -C /tmp/varada-install
  1. Run the following script to install the connector:
sudo python3 /tmp/varada-install/varada-connector-<connector-version>-varada-con/varada/installer.py [-e VARADA_ROOT_FOLDER] [-o ORIGINAL_CATALOG] 
[-c TARGET_CATALOG_NAME]
[-w WORKER_TYPE] [-m METADATA_STORAGE_PATH] 
[-p PLUGIN_DIR_PATH] [-d CONFIG_DIR_PATH] [-b BACKEND_PORT_NUMBER] [-u CLUSTER-UNIQUE-ID] [-op OVERRIDE-PLUGIN] [-dr DRY-RUN] [-sj SKIP-JMX]

For example:

sudo python3 /tmp/varada-install/varada-connector-*-varada-con/varada/installer.py -e /tmp/varada-install/varada-connector-*-varada-con/ -o hive -c varada -w r5d.4xlarge -m s3://<YOUR-S3-BUCKET> -p /opt/trino-375/trino-server-375/plugin/ -d /opt/trino-375/trino-server-375/etc/ -b 8088 -u my-cluster

📘

Replace the parameter values in this example as required to match your Trino deployment characteristics. In particular, replace <YOUR-S3-BUCKET> with your S3 bucket location.

where:

Parameter

Optional

Description

-e VARADA_ROOT_FOLDER

The path to the connector directory.

-o ORIGINAL_CATALOG

The catalog name from which to copy the properties. The supported catalogs are Hive and Iceberg.

-c TARGET_CATALOG_NAME

The name of the new Varada catalog.

-w WORKER_TYPE

Yes

The worker machine type. This parameter should only be specified if the coordinator machine type is different from the worker machine type.

-m METADATA_STORAGE_PATH

The path to the location on your S3 in which the metadata is stored.

-p PLUGIN_DIR_PATH

The path to the plugin directory.

-d CONFIG_DIR_PATH

The path to the configuration directory. The full directory path usually ends with /etc/.

-b BACKEND_PORT_NUMBER

The Varada port number.

Note: The coordinator node and the worker nodes need to be able to communicate via this port.

-u CLUSTER-UNIQUE-ID

A unique ID of your choice for the cluster.

-op OVERRIDE-PLUGIN

Yes

Override the content of the Varada plugin directory at <PLUGIN_DIR_PATH>/<TARGET_CATALOG_NAME>.

-dr DRY-RUN

Yes

Determines if the script is run dry run mode, which shows the operations that would be performed by the script without actually running them.

-sj SKIP-JMX

Yes

Determines if the jmx.properties catalog properties file is added to the <CONFIG_DIR_PATH>/catalog/jmx.properties directory (if the file does not exist).

  1. When you've completed these steps on each of the cluster nodes, run the following command to restart the cluster, and wait for it to start:
sudo /opt/trino-375/trino-server-375/bin/launcher restart
  1. Run the following commands to verify that the cluster nodes are up and that the Varada catalog was added:
# start trino cli
trino

# see the cluster nodes are up
select * from system.runtime.nodes;

# see existing catalogs
show catalogs;

That’s it! You can now start working with the Varada catalog. The first run will be cold, but the connector will instantly start to accelerate queries so that on your next run you’ll see dramatic improvements.

Deploy the Varada Connector with a User Data Script

You can leverage user data to install the connector on each of the cluster machines after they start.

The user-data commands are the same as described above. Note that you will need to change to parameters according to your deployment characteristics.

wget <connector_link> -O /tmp/varada-trino-connector.tar.gz
mkdir /tmp/varada-install
tar -zxf /tmp/varada-trino-connector.tar.gz -C /tmp/varada-install
sudo python3 /tmp/varada-install/varada-connector-*-varada-con/varada/installer.py -e /tmp/varada-install/varada-connector-*-varada-con/ -o hive -c varada -m <YOUR-S3-BUCKET> -p /opt/trino-375/trino-server-375/plugin/ -d /opt/trino-375/trino-server-375/etc/ -b 8088 -u my-cluster

📘

Replace the parameter values in this example as required to match your Trino deployment characteristics. In particular, replace <YOUR-S3-BUCKET> with your S3 bucket location.

FAQ

See Varada Connector FAQ.