This is the second blog in Cloud Thoughts, a series delving into the details of deploying ThoughtSpot in the Cloud.
Here at ThoughtSpot, our goal is to make analytics accessible for every business person through the power of search and AI. While thinking about these end users, however, we’ve not forgotten the IT warriors who are tasked with developing, building, and deploying the infrastructure to support these business users.
And for these IT pros, nothing looms larger than the cloud.
Fret not, IT ninjas! Deploying a ThoughtSpot multi-node cluster in the cloud is very easy, thanks to the ability to use infrastructure orchestration tools like Terraform in combination with application provisioning frameworks like Ansible.
IT operators who are looking to automate the deployment of their infrastructure and provision ThoughtSpot software to setup a cluster can use this combination of Terraform and Ansible to automate the entire provisioning workflow.
In this article, we’ll show you a step by step process to use Terraform and Ansible to create a fully functional multi-node cluster, including deploying the cloud infrastructure needed to host ThoughtSpot and provisioning ThoughtSpot software on these nodes
If you’re unfamiliar, Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can be used to manage ThoughtSpot infrastructure deployment in all the cloud provider environments that we currently support (AWS, GCP, Azure). Terraform is cloud-agnostic and allows a single configuration to be used to manage multiple providers, and to even handle cross-cloud dependencies.
While Terraform is a great infrastructure provisioning tool, it does not come with a configuration management system. That's where Ansible comes in. We will use Terraform to stand up virtual machines or cloud instances, and then Ansible will be used to provision ThoughtSpot software and create a fully functional cluster.
See this article on how Terraform and Ansible are great companions in the infrastructure and application provisioning lifecycle.
In a prior blog post , we covered how ThoughtSpot cloud deployments make use of cloud compute (VM) instances and an underlying persistent storage layer to form a cluster of nodes.
Deployment use case: In this article, we will show you a suggested recipe to automate the deployment of ThoughtSpot in AWS by creating a four node cluster. For the purpose of this example, we will assume that certain resources have already been created in your AWS environment prior to trying this automation recipe.
The AWS resources you will need to create in advance are:
- The AWS VPC where you will deploy ThoughtSpot
- The relevant security groups required for the EC2 instances (Please see the documentation on relevant ports that are needed for external and intra-cluster operation)
In addition, you will need to also have the following information ready to start deploying:
- The relevant AMI of the ThoughtSpot image (Please check the latest release documentation to obtain the AMI ID). This AMI needs to be copied to the region where you want to deploy ThoughtSpot if it is not already there.
- Access to the ThoughtSpot release binary and the MD5checksum file. ThoughtSpot support should be able to provide you with these artifacts. In this workflow, we are locally copying these artifacts on to the deployment server that is used for the automation workflow
- Terraform also needs the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID in order to deploy resources in your AWS environment. You can set these as environment variables if you prefer.
- The cluster ID that you will use for the deployment (contact ThoughtSpot support to obtain this ID)
- For this four node cluster deployment, we will be using username/password based authentication to provision the software. You should contact ThoughtSpot to get this password (used by the automation workflow, as well as for you to login to the instances) once they are up. (Note: when the ThoughtSpot AMI starts supporting keypair authentication, that method can be used instead, in the automation workflow)
High Level Overview
To deploy the ThoughtSpot cluster using Terraform and Ansible:
Use Terraform to declare the infrastructure resources in AWS with the desired number and types of instances and the AMI
- In this example we are launching four (4) ThoughtSpot VM instances, each with one 200GB EBS boot volume and two 1TB EBS data volumes. The AMI that ThoughtSpot provides as of release 5.2 comes with all these EBS volumes setup.
- Define AWS resources that need to be provisioned in the main file and the variables in a separate file.
- Invoke Ansible from within the Terraform code and run the Ansible playbook to setup the ThoughtSpot application software and create the cluster.
In the Ansible playbook
- Setup (prepare) the underlying disks
- Fetch release tarball and the MD5checksum file.
- Install the cluster.
Step 1: Prepare Your Deployment Server
Before starting this step, you will need a deployment server that can access the AWS environment where you are planning to provision the ThoughtSpot infrastructure. This deployment server could also be an existing system used for other centralized orchestration within your environment.
Note: Items (a) and (b) below have been automated via an ansible playbook described towards the end of this section.
Setup terraform and ansible on the server on which you will run this automation workflow (deployment server). Access to the Internet is required from this server to download provider binaries for terraform.
(a) Setup Terraform.
Download and setup terraform in the deployment server.
Download the latest release binaries from here
Extract the binary and add the binary location to the system path.
$ gunzip terraform_0.12.2_linux_amd64.zip
$ chmod 775 terraform
$ cp -rp terraform /usr/bin/
(b) Setup Ansible.
Install ansible on the same server using a package installer like ‘yum’.
$ sudo yum install ansible
Configure /etc/ansible/ansible.cfg with these settings as they are needed to control ssh settings (since we are relying on password based authentication for this example)
enable_plugins = host_list, script, yaml, ini, auto
host_key_checking = False
ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
(c) Automated setup of Terraform and Ansible (skip this if you have completed (a) and (b) above)
To automatically setup the deployment server with the relevant configuration to run this Terraform and Ansible workflow, you can use the ansible playbook in the ThoughtSpot community repo as shown below
$ sudo yum -y install unzip ansible git
$ git clone https://github.com/thoughtspot/community-tools.git
$ cd community-tools/ThoughtSpot_Cloud_deployments/template_Homogeneous_cluster_ebs/deployment_host
Then execute the following commands:
$ ansible-playbook terraform.yaml
$ ansible-playbook ansible.yaml
Your environment is now setup and ready to go.
Step 2: Download the ThoughtSpot release binary and the MD5 checksum
The software binaries for ThoughtSpot need to be downloaded to the deployment server (make a note of the location where you downloaded these artifacts since that will be used later on in the workflow). Contact the ThoughtSpot support team so that they can share the release binaries with you (a tar.gz file and a checksum file).
Step 3 - Deploy the ThoughtSpot cluster:
Assuming the deployment system is setup based on the prior steps, we can kick-off the provisioning of ThoughtSpot.
Note: You should set AWS access and secret key credentials as environment variables if you have not already done so.
(a) Clone the ThoughtSpot community github repo if you did not do this as part of Step 1 (same repo as used for deployment server).
# cd community-tools/ThoughtSpot_Cloud_deployments/AWS/template_Homogeneous_cluster_ebs/oneTB4nodeCluster
(b) Review and update the Terraform files to match your environment
Terraform and Ansible artifacts and details
- main.tf - This file specifies the resources that will be deployed as part of this execution using Terraform’s configuration language HCL. You do not need to modify this file to run execute this deployment.
- variables.tf - This file contains parameters that need to be specified based on your cloud environment. This is the only file you will need to update in order to deploy the cluster. This includes variables such as number of instances, cluster_id, release version, ami_id, instance_type and other relevant attributes. You can check the description of these variables provided in the repository.
- ts-provision.yaml - Ansible playbook for provisioning ThoughtSpot. This file shows the details of the provisioning of the ThoughtSpot software on the AWS nodes. All the steps needed for a fully functional cluster are executed through this playbook. This includes disk setup, copying the release tarball into all the nodes , followed by a cluster installation.
(c) Execute the Terraform/Ansible workflow
(See the Terraform documentation for details of the CLI commands)
(i) Initialize terraform providers
$ terraform init
Downloads the providers and modules in preparation for the execution.
(ii) Dry-run of terraform
$ terraform plan
Provides a preview of the infrastructure that terraform will create
(iii) Provision the infrastructure and the software
$ terraform apply
Creates the AWS resources, provisions the software and creates the cluster
The sample outputs of these commands can be found in the sample_output.log file of the ThoughtSpot Cloud deployments repository
(d) Check the health of the cluster
You can login to any of the cluster IPs in order to verify that it is ready. You will need to use the same username/password identified in the prerequisites section above.
Ensure that the cluster is ready for use by checking the status as shown below :
$ tscli cluster status
Search Engine: READY..
You should now be able to login to the ThoughtSpot home page using one of the Cluster IP addresses. And with that, you’ve ThoughtSpot up and running in the cloud!