Cloudy with a Chance of Low Cost

This is the first in Cloud Thoughts, a new series of blogs that will delve into the details of deploying ThoughtSpot in the Cloud.

There’s no denying we live in the age of the cloud. But as recent research shows, enterprises aren’t just adopting one cloud as part of this paradigm shift. They’re betting big on multiple clouds, with 98% of enterprises planning to run multi cloud environments as soon as 2021.

To help organizations embrace this new IT reality, we've begun supporting multiple VM instance types to host ThoughtSpot across major cloud providers. With this flexibility, IT administrators can make intelligent decisions on the best deployment option to host ThoughtSpot based on their data profile and the need to keep cloud infrastructure costs as low as possible. In this article, we wanted to dive into the details of mapping use cases to the right VM types in AWS, GCP and Azure.

We also wanted to highlight the right way to shutdown a ThoughtSpot cluster if cloud administrators are trying to reduce cloud spend during non-usage.

ThoughtSpot in the cloud

The ThoughtSpot cloud deployment consists of a combination of cloud compute (VM) instances as well as an underlying persistent storage layer (currently, this uses Block storage volumes). The number of instances required for a cloud deployment is based on the size of the data in ThoughtSpot. The instances act as a distributed cluster of nodes to serve query responses.

Note, each VM instance also requires a boot volume not shown in the diagram above

Using the best VM instance configuration for your data profile

The size of the VM instances is a key determinant of the infrastructure cost in cloud. The guidelines below will help ensure that you pick the right type of VM in the cloud provider of your choice for your use case.

Multiple terabytes of data. You should use VMs that have 96CPU and at least 600GB of RAM from our recommended VM list if you have greater than 2-3 terabytes of data.   These VMs will usually be most cost effective if your data is about 1000 bytes/row. Note that all cloud providers that we support may not have such instance types. In that case, you should use the VM size that works with lower volumes of data.

The 96CPU/600GB+ RAM VMs are usually a good fit (lowest cost) for deploying ThoughtSpot if you have greater than 3 terabytes of data. The discounts that you get with your cloud provider will impact the costs, so make sure you factor those into your calculations.

Low terabytes of data. If you have data in the low terabytes (up to 2-3 terabytes), you should use VMs that have 64CPU and at least 400GB of RAM from our recommended VM list. These VMs will usually be most cost effective if your data is about 1000 bytes/row.

Note: In order to get the average size per row, you can dump about 1000 rows of data from your biggest fact table to a temporary file and divide the size of that file by 1000.

(echo "select * from <table_name>;" | tql --query_results_apply_top_row_count 1000 > size.txt)

The 64CPU/400GB+ RAM VMs are our standard recommendation and are usually a good fit (lowest cost) for deploying ThoughtSpot if you have data in the low terabytes. The compute price and discounts that you get with your cloud provider will impact the costs, so make sure you factor those into your calculations.

Small amounts of data. Some of you might want to deploy ThoughtSpot on smaller instances because the data that you need to analyze is smaller in size (100GB or less). One common set of use cases for smaller size deployments is when you are testing and hosting Thought in a staging or non-production environment. Either way, you can use VMs with 16CPU/100+RAM (for up to 20GB of data) or 32CPU/200+GB RAM (for up to 100GB of data) for these smaller data sizes since these will keep your cost of cloud infrastructure low.

Note, ThoughtSpot “lean” configuration is needed before you can use these instance types. ThoughtSpot support will be able to help you with this configuration.

Narrow (Thin) row data. If you have very narrow data sets (with “narrow” being defined as data that is sized 300 bytes per row or less) you can benefit from running instances that are more cost effective for these use cases.

For very narrow row data, CPU becomes more of a bottleneck than RAM. Hence you need instances that are more CPU-weighted (higher CPU to RAM ratio). These instances work out cheaper than the other supported instance types. Based on the cloud provider, these can be 64CPU/256GB RAM or 96CPU/360+GB RAM. You can check the cloud section of our documentation for the cost optimal VM types for each provider for this use case.

Cluster Shutdown

If you do not need the ThoughtSpot cluster to be up and running 24x7, you can shutdown the cluster and restart it during expected usage hours to save on the infrastructure cost of running ThoughtSpot VM instances in cloud provider environments. This is useful especially if you have not purchased reserved instances from the cloud provider.

The right procedure to stop and restart a ThoughtSpot cluster is the following:

1. Ensure there are no issues with the cluster)

$ tscli cluster check

(Make sure the cluster looks healthy and there are no failures)

2. Stop the Cluster

$ tscli cluster stop

(Wait until you see the message: “Done stopping cluster”)  

3. Go the Cloud Provider console and shutdown all the ThoughtSpot VMs in the cluster.

4. When you are ready to use ThoughtSpot again, restart the VMs from the Cloud provider console and then restart the cluster

$ tscli cluster start

(You should see the message: ”Started pre-existing cluster”)

5. Ensure that the cluster is ready for use by checking the status

$ tscli cluster status

This should show:       

Cluster: RUNNING

Database: READY

Search Engine: READY

Depending on the size of your cluster you might have to wait several minutes before ThoughtSpot services and data are fully ready after the cluster has been restarted. Hence, you will need to account for this startup time to ensure that the system is fully operational in advance of expected usage.

We recognize the importance of keeping cloud infrastructure costs low for our customers, so we are committed to making more changes to help you lower your cloud spend. Watch this space for more exciting announcements on this topic.