How We Engineered ThoughtSpot to Deliver Search at Scale

Here at ThoughtSpot, we don’t believe business leaders should have to wait for days or weeks to get a report built by IT or data analysts. The beauty of search-based analytics done at scale is that it can dramatically reduce the time to get insights from data down from months to seconds - for anyone.

Search done well always looks so simple. However, building the technology to scale search-based analytics across billions of rows of data is anything but.

There’s a lot of magic happening under the hood of ThoughtSpot. Our analytical search engine architecture is composed of multiple services, such as search backends, a relational data cache, a BI engine, etc. Each of these services (or "microservices") is an independent distributed system, communicating with each other over the network. From the moment a user enters a keystroke in the search bar until they see a result, a flurry of network calls are made to deliver the perfect answer to the user.

We had to make a number of complex engineering decisions to build our system from scratch. In this post, we cover the rationale that led us to using Remote Procedure Calls and we’ll give you a peek into network communication infrastructure in ThoughtSpot.

**Deep dive technology alert: If you become alarmed when reading technical material, you should close this browser tab now.**

The simplest software abstraction for network communication is the “socket”. It's a no frills wrapper over an underlying network connection (e.g. TCP connection) which can be used for sending raw bytes. But real systems need more than just the ability to send and receive data. For example, they need APIs, fault tolerance, ease of use, cross-platform support, etc. These needs can be fulfilled using higher level abstractions, two common choices being Remote Procedure Calls (RPC) and HTTP based RESTful APIs.

A Remote Procedure Call (RPC) looks like a function call but is fulfilled by making a network call to a server, which is usually on a different host. Remote Procedure Calls were popularized by the Sun RPC implementation in the 1980s, and are used by a wide range of web services and distributed systems today.

REST (REpresentational State Transfer) is an architecture style for defining network APIs. Proposed in 2000 by Roy Fielding, HTTP based RESTful APIs have become the most popular way of creating scalable web services.

In ThoughtSpot, we use both RPCs and JSON over REST heavily. RPCs are primarily used for communication between backend services. JSON over REST is exposed by the backend web service for consumption by the web frontend.

Using REST for web services offers several benefits:

  • Self-documenting: RESTful APIs combine HTTP verbs (GET, PUT, POST, etc.) with hierarchical “resources” which represent entities or concepts within the system. The combination facilitates exploration and tends to be self-documenting.

  • Separation of concerns: ThoughtSpot exposes a rich set of features through the UI. The resource hierarchy in REST allows us to easily separate different feature sets by exposing them as different resources.

For communication between backend services, we have adopted RPCs. Unlike JSON over REST, RPC frameworks can provide greater efficiencies in CPU, memory, and network bandwidth. They can also provide better type safety than JSON. RPCs are also a more intuitive programming model since they appear as function calls in the code.

Our internal RPC library uses Apache Thrift as the transport and Google protocol buffers as the message format. It provides additional features on top, such as name resolution, connection pooling, fault tolerance, distributed tracing, and logging. We have been using this framework in production for over a year now across 4 programming languages: C++, Java, Python, and Go.

Developers have many RPC frameworks to choose from today. Based on our experience, we think that following features are desirable in an RPC framework:

  • Fast, event-driven communication

  • Efficient wire encoding and decoding

  • Space efficient wire coding

  • Language and platform independent, declarative interfaces

  • Integration with common programming languages

  • Support for synchronous and asynchronous communication models

  • Connection pooling

  • Health checks and load balancing

  • Authentication support or hooks

  • Monitoring hooks for tracking response times, queuing latency, etc

  • Testing hooks

Recently, we started evaluating Google’s newly open-sourced, BSD-licensed RPC framework, gRPC. We met with the gRPC team to learn about the project and were impressed by their commitment to building a vibrant open source community around gRPC. The project is being run in the open with upstream master repositories on GitHub and a Google Group for discussions.

We are excited about gRPC for multiple reasons. gRPC uses HTTP2 as the transport, which comes with goodness like connection multiplexing, bidirectional streaming, flow control, and header compression. The seamless integration with protocol buffers is a great plus for us, given that we are heavy users of protocol buffers. Additionally, gRPC puts a strong focus on building low latency, highly reliable production services through framework support for health checking, monitoring, etc.

At ThoughtSpot, we’re all about delivering breakthrough performance for the end user. Designing our network communication infrastructure like this is just one way in which we’re trying to disrupt the BI industry and help our customers make smarter business decisions.

Want to see this technology in action? Join ThoughtSpot CEO, Ajeet Singh, and Principal Engineer Vijay Ganesan for a deep-dive webinar that will go “Under the hood of ThoughtSpot” on Tuesday, April 28 at 10:30am PST.