Analyzing Marketing Data: Getting Data into ThoughtSpot (Part II)

I recently wrote about a custom demo we developed with one of our partners, Celebrus. The business use case was providing ad-hoc analysis for Celebrus’ marketing department. Katharine Hulls, VP of Marketing, and her team have a rich set of data across all their customer interaction channels, but they don’t have the skill set to develop their own custom analytics on that data. So if the pre-built reports don’t answer their questions—for example about cross-platform effectiveness of a specific campaign—they have to request a one-off, and join the BI team’s report backlog.

The initial business case was running  ad-hoc queries to track things like the performance of a new campaign or drill down on fluctuations in lead-flow.  So, for the initial demo we loaded a date-range-limited subset from their product database, built some ‘pinboards’ (our word for dashboards) to give them a starter for ten, and let the team loose.

From the start they loved being able to create their own queries on the fly from the search bar. It was exactly the kind of ad-hoc analytics capability that they needed. But they soon sent us a new challenge: Katharine and her team were having difficulty filtering searches that analyzed data about visits to campaign landing pages. In some cases searches were including URLs that they didn’t expect, and they couldn’t figure out how to remove them. Katharine’s team needed help.

So we went back to the drawing board. The first time around we had exposed the full data set, which included page visit data for intranet and public URLs, summary tables, base data, and transformation data. It became obvious this was causing ambiguity in Katharine’s searches. She didn’t want intranet URLs, but there was no way to categorise them and exclude them. There were also some ambiguous terms, because a data item might appear in a transactional table, a transformation table, and in an aggregate. It wasn’t obvious to Katharine, as a business user and not a data analyst, which one she needed.

ThoughtSpot offers two ways to deal with this kind of issue. One way is to exclude the unnecessary data from the cache. The other is to partition the cached data into “worksheets.” The one liner definition of Worksheets within ThoughtSpot is that they are subsets, possibly overlapping, of a schema, focused around the use cases of a specific user or role of user. It’s linked to credentials, so  users only see the dataset that is relevant for them and that they are authorized to see.

In this case we chose to eliminate the unnecessary data, including summaries and aggregates, from the load entirely. These are typically included to speed up traditional analytics, but with ThoughtSpot are unnecessary, as it’s fast enough to query the granular base data (ideal for the unpredictable searches users demand).

When we discussed this decision with Celebrus, we emphasized a best practice to only include what the implemented use cases demand. With ThoughtSpot, it’s easy enough to add new data sources to the cache and to introduce worksheets as needed. So we recommend customers do that as more use cases come online. There’s no need for a big-bang schema design up-front, which will only pay back the effort and data costs as users come on board incrementally.

But we still had the issue of intranet URL data being included in what were meant to be searches based only on external URLs. This turned out to be  a ‘feature’ of  the data—something that happens in the real world, but should be excluded from the analysis. Their code understands whether a URL is only intranet-exposed, but without the analysis, it’s not obvious to anyone else. We could have replicated this business logic in the ThoughtSpot’s metadata, but rather than do that, Celebrus re-processed their data with an additional rule to expose the result of the logic as an indicator in data. That’s unusual for us—we don’t need changes to original data sources. But it highlighted an interesting relationship in the data and saved duplicating logic in the ThoughtSpot metadata.

Now we had a much more approachable set up for Katharine, so she and her team are going to get to grips with it. Katharine has written here, about her side of the story so far.  She and her team are really enjoying the simplicity of search-driven analytics—how easy it is to create ad-hoc reports and drill down into the data to explore further. I’m looking forward to her next post when she’ll tell us just how these refinements worked.