Product and Engineering

Storytelling through charts - how to lie with data

I spent more time than you would expect thinking of a good tagline for “storytelling through charts”. This post’s initial goal was to discuss how useful charts are for providing an easy and high-level understanding of data, and how you can use them to tell a story. However, like any good story, the power which charts and data visualizations provide can be easily manipulated. Especially today where there is both a massive amount of data, and rampant spread of misinformation, it important to recognize this manipulation both when creating and consuming visualizations. 

A brief history

We can all thank Mr. William Playfair, a Scottish man who - according to Princeton University - was a scoundrel, who “tried and failed at many things in his life but left a permanent mark on statistical graphics, pioneering the way that we have visualized data ever since.”  Playfair is known for creating the pie chart, bar graph, and statistical line graph. 

“A scoundrel?” you ask. Well, Playfair had a predisposition for using said charts to swindle people. As in, he was literally convicted in 1805 of swindling. And he still didn’t stop - in 1816 he attempted to blackmail a Lord by claiming he had papers which “disputed inheritance of the [Lord’s] estates”.

In the words of Howard Wainer, a Wharton Business School statistics professor, “there was a money-grubbing, opportunistic, and reckless aspect to Playfair’s character...these disagreeable attributes, ironically, played midwife to the invention of statistical charts”. 


Playfair’s liberal use of scale

What to do

Given that the creation of these familiar charts is directly tied to providing consumers with misinformation, you’ll want to remember these key points about how they may be manipulated. 

Consistent scale

Let’s start off with a chart comparing the budget and box office profits of the original Star Wars trilogy. 

Now, since this point is titled “consistent sale” you hopefully immediately caught the issue with this chart - in a glance it gives the impression that The Empire Strikes Back and Return of the Jedi both spent more money than they actually made at the box office. Sure you might be  tricked into believing this about Attack of the Clones, but in this case something looks off. After a first look, you realize that the scales are completely different. A better chart would put both the box office and budget numbers on the same scale - which would look something like this:


Of course, there can be a time and place for separate scales, but it’s good to know when it may cause confusion. 

Correlation is not causation

This is a statistical cliche, but still important to pay attention to. When in doubt, you can always refer to Tyler Vigen’s fantastic website, Spurious correlations. It has some great gems, such as… did you know that eating chicken impacts oil prices?!


Of course it doesn’t. They correlate but one does not cause the other.

Context

My last simple point is understanding context. Excluding specific data from charts and showing ones that only fit a certain view are easy ways to provide incorrect information. Saying for example “Los Angeles has more murders than Saint Louis” is technically correct. Correct if you ignore the most simple context as the difference in population. Add population to the context and the story changes. 

Many have seen charts that show drastic changes in stock prices or unemployment, but without a view of those changes over many years rather than several months, you should ask yourself “is this real or am I being swindled?” . 


Disney stock continues to plummet after buying LucasFilms! Is Star Wars doomed?? 


Disney stock over 2012 --- looks like it’s doing ok

One of the most valuable pieces I find in ThoughtSpot’s charting is the ability to easily drill down and view more data. Sometimes it can be difficult to provide all the needed context and information within a chart - and the ability to quickly add new columns or change the view on the fly can significantly help. Search is a natural and easy way to do this - for example, with the Los Angeles / Saint Louis example I use earlier, in ThoughtSpot purely adding “by population” quickly lets users see the information in a more accurate way. Search suggestions have relevance, which makes it easier to know which columns one could add to get a clearer answer. 

×