Getting Started with Varada AMI

This topic outlines the tasks required to get started with the Varada Query Acceleration Platform using an Amazon Machine Image on EC2 instances, including:

Create Policies and Roles

To configure an EC2 virtual machine, you first need to create a policy, and then create a role to be associated with the policy.

Create a Policy

  1. Log in to the AWS Console and select IAM.
  1. Select Policies, then select Create policy.
  1. Create the policy by copying and pasting the snippet below. This snippet sets permissions for S3 and Glue.
{
 "Version": "2012-10-17",
 "Statement": [
   {
     "Sid": "S3Permissions",
     "Effect": "Allow",
     "Action": [
        "s3:Get*",
        "s3:List*"
     ],
     "Resource": "*"
   },
   {
     "Sid": "GlueGlobalPermissions",
     "Effect": "Allow",
     "Action": [
       "glue:*"
     ],
     "Resource": "*"
   }
 ]
}
  1. Select Review Policy.

  2. Give the policy a meaningful name.

Create a Role

  1. Create a role that will be attached to an EC2 instance.
  1. Associate the policy created earlier with the role.
  1. Provide a meaningful name, and create the role.

Launch an Instance

  1. In the AWS Console, select Launch Instance.
  1. From the AWS Marketplace, select the VARADA AMI.

  2. From the r5d family, select an instance type. We recommend you get started with the r5d.8xlrage instance type.

  3. Select the IAM Role you created earlier.

📘

The Varada Data Virtualization Platform using AMI supports single-machine clusters only. If you select a Number of instances value that is greater than 1, it will result in separate, disconnected clusters.

  1. Configure the associated Security Group so that ports 8080 and 22 are open. You can open the ports for a single IP address, or for a range of IP addresses. For the available options, see here.

📘

If you wish to use a port other than 8080, edit the http-server.http.port property and the discovery.uri property in the /etc/presto/config.properties file, and ensure that the same port number is entered in the associated Security Group.

  1. Select a pair of SSH keys to ensure a secure connection to the instance:

Connect to an Instance

  1. To connect to the cluster coordinator, use SSH and the user-name ec2-user:
ssh -I <private_key_path> [email protected]<presto_coordinator_ip>
  1. After connecting to the cluster coordinator, use the Presto CLI run queries:
    start presto cli:
pcli

Use the Varada catalog to see tables stored in Amazon S3 if you are using the Hive Metastore or AWS Glue Data Catalog:

use varada;

To access the Varada Web UI, use a browser to go to http://<presto_coordinator_ip>:8080.

Use the port number configured in the /etc/presto/config.properties file, and entered in the Security Group.

👍

You can start running queries on the public Ride Sharing dataset to get hands-on experience working with Varada.

Optional Additional Configuration

Connect to the Hive Metastore

By default, AWS Glue is used as the metastore. If you wish to use the Hive Metastore instead, in both /etc/presto/catalog/varada.properties and /etc/presto/catalog/hive.properties, change hive.metastore=glue to hive.metastore.uri=thrift://< HMS_IP_ADDR >:< HMS_PORT >.

Alternatively, execute the following commands to apply the changes:

sudo sed -i
'/hive.metastore=glue/c\hive.metastore.uri=thrift://ip-10-0-0-1.ec2.internal:9083'
/etc/presto/catalog/varada.properties
sudo sed -i
'/hive.metastore=glue/c\hive.metastore.uri=thrift://ip-10-0-0-1.ec2.internal:9083'
/etc/presto/catalog/hive.properties

After making these changes, restart the Varada service by running:

sudo service varada restart

Configure Additional Data Sources

To configure additional data sources, configure or create the appropriate files in /etc/presto/catalog/.

For example, to use PostgreSQL DB, make the changes described here.

After making any configuration changes, restart the Presto service by running:

sudo service presto restart