Voice CodeX 2015: How We Killed the Keyboard with 3D, Motion-Controlled Analytics | ThoughtSpo

The Team: Jordie Hannel, Shikhar Agarwal, Prateek Gaur, Sunny Singh, Ravi Tandon

The cornerstone of our engineering ethos at ThoughtSpot – the Arc Reactor to our Ironman – is our ability to perform the spectacular feats of engineering necessary to achieve massive speed and scale while maintaining ruthless simplicity from the end user perspective. This is what differentiates our product, and this is why our product will change people’s lives.

In fact, our ambition within the engineering team is so great that we had multiple teams of Google-caliber engineers working hard for over 18 months to build every aspect of our product in-house from the ground up, all before our GA release. This includes:

  • Orion: Linux-based cluster manager to coordinate multi-node distributed systems, provide status/maintenance alerts and smoothly handle system upgrades, all with the speed that only a custom solution can provide.

  • Falcon: Entirely in-memory database operating on top of HDFS to perform joins, sorts, filters, aggregations and more on multi-TB datasets with unprecedented speed.

  • Sage: Real-time service that powers ThoughtSpot Search – responsible for user-personalized auto-completion of queries using machine-learning, indexing millions of rows before the user even finishes typing, as well as imposing natural language models to translate English queries into SQL-like database commands.

While building our own cluster manager and database system works absolute wonders for performance, it can’t do much for our users without seamless UX, hence my team’s hackathon project. The Tomorrowlanders assembled under one vision – to create an interactive demo that could showcase ThoughtSpot’s speed and intelligence with a Minority Report-esque interface.

Although everyone on the team was confident in our ability to implement the world’s most futuristic computer interface in three days, we decided to slightly reduce the scope of the project. Our goal was to demonstrate one realistic use case using input from voice commands and a LeapMotion device – no mouse and no keyboard – in just 72 hours.

LeapMotion Gesture Integration

For those of you who are not familiar, LeapMotion is a nifty device that uses two wide-angle, visible light cameras and three infrared LED sensors to track the position and orientation of hands or objects in real time. In our first integration attempt, we used LeapMotion’s natively recognized gestures, but found them to be of fairly limited reliability and user-friendliness. Ultimately, we implemented our own gestures with great success.

We recognize finger tapping gestures for each finger (both continuous recognition while the finger is down – used for zooming – as well as one-time recognition for UI navigation and selections), omni-directional palm rotation (used for variable speed globe rotation shown above), and a grabbing gesture (used for maximizing charts). Many other gestures can be recognized; the main issue that arises as the number of supported gestures increases is the ability to differentiate between them and decide precedence if/when multiple gestures are recognized. For example, your palm rotates slightly when performing the thumb-tap gesture, so we set a minimum palm rotation threshold and give precedence to thumb-tap over palm rotation.

Another gesture set we experimented with is point-to-click – i.e., the ability to move/click a cursor on screen using your index finger as a mouse-- but this feature didn’t end up making it into the final demo. Making point-to-click pleasant requires high tracking precision across a large field of view, as well as position-stabilizing algorithms. We will likely experiment further with this in the future since there is huge value in being able to replicate the entire functionality of a mouse.

Voice Control

At the early discussion stages, many of our team members were hesitant to use voice control – it’s one of those technologies that’s perennially ‘almost there’ but never quite seems to get it right (think virtual assistants and holograms). We feared that when we ultimately demoed our work, there would be background noise or the Voice-Reco Gods would frown upon us, and all our hard work would result in: ‘I’m sorry, I didn’t quite get that. Did you mean “applesauce”?’. But after playing around with Google Chrome’s native Web Speech API, we decided to take a leap of faith and go for it.

Although Google’s accuracy with unrestricted recognition is impressively high, we chose to restrict the set of recognized tokens as well as apply our own phonetic string matching – just to sooth our nerves. One major issue we did run into was the fact that Google uses multi-token fluency scores as a factor in recognition. Which is to say that the Web Speech API prefers to recognize strings of tokens that compose a (somewhat) grammatically correct English sentence. This was prohibitive for our purposes because our most important use of voice was recognizing BI queries such as ‘sales, customer industry by country’ – which is unfortunately not an English sentence. Our demo solution was simply to recognize tokens one at a time, which requires a pause after each token – for our production version we will research disabling the multi-token fluency or using a different API.

Shipping Tomorrowland

Despite a rousing reception to our presentation at the ThoughtSpot hackathon closing ceremony, our work is not quite ready to ship to customers. So the question we’re left with is: how do we leverage our 72 hours of hard work to elevate ThoughtSpot?

Voice control has the potential to be an integral part of our mobile app (alpha coming in 3-6 months!), and several Tomorrowlanders will dedicate time moving forward to productize our voice control system for the small screen. However, as fond as we have become of LeapMotion, we acknowledge that it may not be reasonable to expect our business users to buy and use the device with ThoughtSpot – but we can certainly still use it internally…

In fact, by year’s end we aim to have a dedicated ‘Tomorrowland’ conference room equipped with a fully productized version of our Minority Report demo. We’ll let existing and potential customers, investors, and potential hires run wild in Tomorrowland and see for themselves how ThoughtSpot is pushing business into the future.

If you haven’t already, check out Part 1 and Part 2 of our CodeX blog series.