A Moment of Science: Just Plot It
By Josh Hemann
A professor I once had used to say never show statistics without a picture; never show a picture without statistics. In the world of analytics, data visualizations exist to make data more understandable and in such a way that descriptive statistics alone cannot.
However, I come across more bad examples of visualizations than good, even given the energy and investment spent across industries in data-driven decision making. There are some popular blogs (e.g. Junk Charts and Visual Business Intelligence) that focus on data visualization, providing fantastic examples of what not to do (along with supporting explanations), and they have a seemingly endless supply of material to discuss.
As I write this I am reminded of a more famous quote by Richard Hamming: The purpose of computing is insight, not numbers. Getting insight from visualizations can be hard, but following some ground rules goes a long way, and I'll cover two such rules here.
Rule #1 - Don't forget that humans perceive
William S. Cleveland has done a fair bit of research studying how people perceive visual information displayed in charts and diagrams. Most of this came out of his work while at Bell Labs. (I am constantly blown away by how much foundational statistical computing work has come out of Bell Labs. Non-statistical advancements too: recently I have been dealing with some ASCII-to-Unicode text parsing issues and learned that the UTF-8 encoding used in half the world's web pages came out of, you guessed it, Bell Labs.)
Cleveland's contemporary Edward Tufte has published a series of very popular books on visualization that infuse philosophy and historical narrative; Cleveland on the other hand, has brought a lot of scientific rigor to the topic and focuses on the context of statistical analysis. His book The Elements of Graphing Data  should be on every analytics professional's shelf.
Prior to publishing  Cleveland published a short paper with Robert McGill covering how people are able to make judgments based on data presented in various ways. For example, they found that information displayed using the area and color of shapes was harder to understand than the same information presented with more basic elements like lines, which have lengths and slopes. So, seemingly benign stylistic choices can adversely affect our ability to perceive patterns in the data. Often, simpler is better.
Rule #2 – Choose graphic layouts to aid perception
Because line lengths and slopes matter, the "window frame" in which these lines are plotted must be taken into consideration. These plotting windows can be characterized by their aspect ratio, which is the ratio of the length of the window to its height. The basic idea is that one cannot simply shove any data set into a fixed plotting window without hurting our ability to perceive relationships in these data. This issue is especially relevant nowadays, where visualizations are increasingly seen on the web or in dashboards. In these contexts, the plotting window sizes are often chosen solely on screen real estate and layout constraints, rather than for optimal perception by the intended audience (this is especially problematic when presenting technical information in PowerPoint).
In the following section, I am going to explore a particular, fantastic and classic visualization example involving sunspot data and the aspect ratio of plotting windows. Using this data to highlight the aspect ratio issue provides a nice vignette of:
- how the simplest visualization task still requires careful thought
- how to quickly explore data
I'll dive into the sunspot data and plotting it with different aspect ratios, and I'll show this vignette through using a new tool for interactive, collaborative analytical work that I have been looking for an excuse to play with.
Note: The following content is best viewed in a browser other than Internet Explorer. It relies on HTML5's WebSocket API which is supported in Firefox, Chrome, Opera and Safari, but not IE9.