In 2012 as ThoughtSpot was born, we started evolving the idea of search-driven business intelligence (BI). Our vision was to raise the bar on ease of use of a BI tool, making it accessible to anyone. We were convinced that nothing could be easier to use than search. We also wanted our search to work very well at enterprise scale.
Best of both worlds?
In the BI industry recently, departmental desktop BI vendors have driven enterprise purchases. These products have resulted in fragmentation of truth and compromises on enterprise quality. We asked the question “does providing ease of use and direct access to business users have to mean compromising on being enterprise-grade”? We believed that the answer to that was NO. We could provide the best of both worlds to our customers - Google-search-like ease of use with absolutely no compromise on enterprise-grade data and user scale, advanced analytical capabilities, security, governance, integration and manageability.
Now, that was quite ambitious (some people would have called it foolish!). This had never been attempted before. The magnitude of what we were imagining was daunting. In fleeting moments of self-doubt, we asked ourselves if we should just build a search-driven BI application that sits on top of other platforms.
Search and BI
One of the greatest challenges in BI has always been getting fast performance of analytic queries. Unlike Google’s keyword-based searches of documents, ThoughtSpot’s search is a “relational” search. The end result is a visualization representing an analysis involving groupings, aggregations, formulae, filtering etc. of large data sets joined across many tables.
If you did a Google search and had to wait 30 seconds for any results to come back, would you use it? Probably not. The same applied to our BI search. Extremely fast performance is crucial for delivering on the promise of ease-of-use through search.
We estimated the UI response times users would expect as they typed search terms and the system instantly rendered visualizations reflecting what the users typed. We calculated the back-end performance this required. Security was critical; not just for visualization data, but also row-level security of the indexed data values users were searching. All of this at terabyte data scale.
Could we find a platform to build our application on that would meet our requirements?
The search for a platform
Clearly our search and query engines had to do in-memory processing. Disk-based systems were simply not feasible for this kind of performance demands. Our scale goals required a distributed MPP (Massively Parallel Processing) query execution engine. Moreover, our architecture had to deliver robust cluster management that required little administration work by customers. We did not want any of the traditional BI pre-computation technologies such as cubes and aggregate tables. We did not want any proprietary hardware.
We found that there was no platform out there that could meet our requirements. It dawned on us that the only way to get the experience that we wanted to provide our customers was to own the entire stack end-to-end. We could optimize things up and down the stack to get the required performance and experience. It would allow us to leverage our knowledge of specific application patterns, for example:
- In the early steps of an ad-hoc analysis flow, when data is not fully curated yet, result sets can be very large. But users are not going to browse through such a large data set. Here our system can efficiently provide a sampled view of the result set very quickly.
- The query processing engine can optimize the experienced based on knowledge of whether the user is likely to paginate over result sets or download the entire result set.
- The cluster manager can efficiently do consistent snapshots of the entire system utilizing the knowledge of the application’s interactions with metadata and data stores.
While search-driven BI was revolutionary (and has proven itself), we wanted to set ourselves up to do more in the future. What about embedded analytics? What about AI and Machine Learning based system-generated searches? What about predictive analytics? And voice- and gesture-driven analytics?
The ThoughtSpot Data Platform
We arrived at the conclusion that we needed to build our own robust data and analytics platform. Search would be the first application leveraging the platform. If we built a strong platform foundation, we could keep adding more and more applications rapidly over time.
Looking back now in 2017, I’m so glad we took the platform approach. Our (foolish?) ambition has paid off. We have a solid platform like no other. The power of this platform is allowing us to rapidly add new applications such as Bots, Embedded Analytics and Automated Insights. The complete platform and applications are positioning us to become the enterprise BI standard at companies.
The success of Apple’s iPhone was in large part due to this platform approach and owning everything end-to-end. While this is more difficult to do, when you do pull it off, you end up with a far superior product and a huge competitive advantage. Maybe an idea that is not so foolish.