Product and Engineering

What Software Products Can Learn from Vertical Integration

Over the past few centuries, the world economy has grown dramatically more complex. The array of products and services available in the market has skyrocketed, and the role of the typical individual organization has become much more specialized, in the name of efficiency.

However there are notable exceptions to this rule of specialization. Take for example Carnegie Steel, one of the behemoth monopolies founded in the late 1800’s. Its founder, Andrew Carnegie, realized that by incorporating all of his “upstream” suppliers and partners, he could achieve greater predictability and control and ultimately increase profits and secure his monopoly in the steel market. This strategy is called vertical integration.

There is a similar trend of specialized componentization in the world of software - and I argue that there is an analogous strategy to vertical integration. Lets call it vertical awareness.

Vertical Awareness in Software

Just as the production of steel is comprised of “vertical” layers of processes such as mining, transportation and smelting, a software application is typically comprised of vertical layers of functionality and abstraction. At the very top is the user interface, and at the very bottom is machine code. Each layer in between relies upon the one before it: a browser app depends on a web server which depends on a database which depends on an operating system...you get the idea.

In some applications, these layers are highly disparate and modularized. Each layer is unaware of the specific needs of the layer above it, and all interaction between them is through a generic interface. For example, if I were to write a web app to analyze live Twitter data, it would be built on Twitter’s public API, which is generic and identical for all users, rather than specialized for my app.

In other applications, the responsibilities of these layers are more blurred together. One layer is specialized to handle the specific needs of the layer above it. An extreme example might be the control systems of a cutting edge space capsule. A less critical and less highly engineered system would likely be composed of generally available modules, linked together. However to achieve the best possible capsule, the entire system needs to be engineered together: each component specialized with the others in mind.

Vertical Awareness in ThoughtSpot

Generally speaking, business intelligence applications cover a fairly large vertical distance, including a very robust database layer, query generation layer, business logic/web server layer, visualization layer and the rest of the UI. Given this, it is common for BI applications to be relatively vertically unaware, to simplify the codebase. Often, the database layer is actually a completely separate piece of software which the BI app communicates with via a standard SQL interface.

While separating the database layer has many benefits, such as allowing for the use of customers’ existing databases, it quickly becomes a major constraint in the overall user experience. To eliminate this constraint, the ThoughtSpot system does not rely on an external database - generally each layer is specialized to serve the needs of higher layers. This does add some complexity, but the benefits are clear and compelling - here are some examples:

Query Sampling

The database is aware of the number of rows that can be handled on a per-query basis. For example, some customers access ThoughtSpot via API to get raw data. For these queries, the full result set is returned. However the typical query is issued via ThoughtSpot search, and the resulting data will be visualized in the user’s browser. These visualizations have limits on the number of data points visualized - 5000 for example - and the database stops computing after the required number of rows have been accumulated. This results in huge time savings, considering that full result sets often reach billions of rows.

Nested Query Optimization

ThoughtSpot supports the ability to save a search as a table, and then issue additional searches on top of the original. This kind of composability is highly versatile and allows for the construction of arbitrarily complex queries - such functionality would likely be achieved by database views in other systems. However these nested queries can be very computationally intensive to execute. With a traditional database view approach, each nested query would be fully executed and materialized, and the final query would run against these materialized queries and return its results.

However the results of those intermediate queries are never directly used, except for in the final query. The ThoughtSpot database improves upon this by essentially merging these nested queries into one big query against the normal tables in the database, instead of these view-like saved searches. That big query is then optimized.

To give a simple example, imagine one search “total revenue by customer name by state” and another search on top of that “average total revenue for california”. One optimization our database would perform is to push down the “state = california” filter to the first query. That filter optimization alone would reduce the compute required by perhaps 50x, and that is just one example of many.

Predictive Query Processing

This is a forward looking example, but all the pieces are in place for it. ThoughtSpot already has a smart search engine which predicts search completions for the user based on their history, others’ history, characteristics of the data, and so on. We can leverage these predictions to improve perceived performance.

For example, imagine a user has typed “average revenue by” and the search engine predicts with 83% confidence that the final search will be “average revenue by state”. As soon as that prediction is made, ThoughtSpot will begin computing “average revenue by state”, gaining a precious second or so of perceived performance as the user actually finishes his search.

Smart BI Needs to be Vertically Aware

It is true that this kind of vertical awareness increases overall complexity of the code. Interfaces between components have more surface area, issues can be tougher to debug and comprehensive testing requires some creativity and elbow grease. However we believe that any BI application that hopes to be easy enough and valuable enough for business users to use on a daily basis needs to err on the side of vertical awareness. As shown in the examples above, this has the power to dramatically increase speed and simplify the user experience.

Building economies and software applications from highly specialized building blocks makes a lot of sense, of course. This pattern is responsible in large part for the bounty of options we enjoy online and at the supermarket every day. However, there will always be some applications which just cannot be adequately done with off-the-shelf components: they need to be built from the ground up. It’s not an easy strategy or a cheap one, but it can result in success of historical proportions - just ask Andrew Carnegie.

×