You've got data. Great!
But what chart should you use to extract insights?
This 3-step process will give you confidence that you're using the right chart — in the right way — with your unique data and context.
- Identify your question
- Identify and label your column data types
- Match your question and data types to charts
These are the 3 steps I developed studying data visualization at Stanford and creating charts used across tech, non-profit, and public policy. Great charts have many nuances. But, from my experience, you can get at least 80% of the value of charts if you get these 3 steps right.
Identify your question
"The purpose of visualization is insight, not pictures" - Ben Shniedrman
The biggest mistake I see is people starting by saying, "I need an X chart," rather than "I need to understand XYZ about this part of our work."
By clarifying your question first, you'll be able to validate charts effectively, and perhaps find a better chart you'd never considered.
When you've established your question, identify which of 2 buckets it falls into:
- Exploring the data for yourself. For example:
- "Are there any months where lost customers exceeded new customers? Do other data points help explain this?"
- "Which locations are performing best across key outcomes? Are there outliers?"
- Explaining what you've found in the data for others, usually around a well defined question. For example:
- "What's the monthly ratio of new customers to lost customers?"
- "How closely does operating cost correlate with location output?"
Identify and label your column data types
Next, identify the columns relevant to your key question.
Label every relevant column as either:
- Numbers, eg sales volume, expenses, engagement stats, user age, etc
- Dates or times, eg delivery date, date of birth, the month and year that a given row describes
- Categories, eg item type, state or province, gender, industry name
(We'll exclude other less common forms of data types, such as geo spatial data or ordinal data)
Match your question and data types to charts
With a clear question and knowledge of your data types, you're ready to select multiple charts, selected based on:
- If they support your selected column types
- How well they answer the sort of question you're asking
- What they're great for — and when they're bad to use
When to use a bar graph?
Bar charts are a great way to compare numbers across categories, not least because they're familiar in nearly every sector.
- You've got data aggregated by category
- You have a small to medium number of categories
- Categories can be reasonably compared
- You have a larger number of categories than can be fit in a chart
- You have outliers that make some bars huge and some minuscule, making comparison difficult
- When categories aren't meaningfully comparable
When to use a histogram?
With Histograms you can assess how a numeric column looks on a high level.
- Asking high level questions of a single column: Are all values similar? Do values bunch up around certain values? Are their outliers?
- Used in comparison with other histograms (see the Economist's language rate example above). Dataviz gurus call this "small multiples", and it allows for easy comparison
- It's important to be able to identify individual points
- There's a small amount of data
When to use a scatter plot?
Scatter plots help answer one of the most important questions: How do two numeric columns relate? Are they correlated? Which data points fall within (or outside) the general trends?
Unlike bar or pie charts, scatter plots work well with with hundreds of data points. Even so, they often still make it easy to clearly spot outliers. For example, the NYTimes example above clearly shows a trend (richer schools have better test scores), as well as outliers (Liberty Elementary is richer but has low test scores).
As a bonus, you can also show categories by coloring the points, or display a third numeric value using the size of the points.
- You want to find or show a clear relationship between two numeric columns
- You have a small to large amount of data points
- You want to spot outliers
- Your data has extreme outliers, which make the general trends difficult to see in detail
- It's clear there's no little to no relationship between data points
- You need to see every single data point clearly (in the example above, many points overlap)
When to use a line graph?
Often called "time-series", line charts are almost always used to show a numeric value changing over time. They often feature one line for each of several categories.
- You have the same data points recorded many time across many time points
- You have 3-8 categories over time you want to compare
- You have too many categories, making it difficult to see the trend of any one line
- Lines are highly irregular, making it hard to clearly identify individual lines
- You have a huge number of categories to show (9+)
When to use a pie chart?
Also known as "donut charts" when the center is removed, pie charts reveal how a small number of categories add up to a make a true whole. For example, the Economist's chart above shows how ad spending in each form of media adds compares to the total spent on advertising.
Bar charts are infamous for being used in clumsy ways, so proceed thoughtfully!
- You have a small number of categories (2-8)
- Categories are all parts of a larger whole, and no category is missing
- You need only to make rough comparisons
- All categories are a similar size, meaning it's difficult to make meaningful comparisons
- You have more than 8 categories, crowding the chart and making it difficult to read any one category
- You have many categories that are very tiny compared others: these are often hard to read or see
- Categories do not add up to a true whole; for example, each category of the revenue of a location, but several locations are missing
- Shown in 3D: this distorts the data, makes values harder to read, and adds no further information
- You need to make precise comparisons between categories: use a bar chart instead!
The elephant in the charting room...
Ready to create awesome charts? Almost...
Unfortunately, there are still two practical barriers to effectively and efficiently creating great charts:
- Loading and formatting your data for visualization
- Keeping charts up to date and shared with your team
Without a strategy to tackle both, charts will take forever to make, and have little impact once created.
We know: we’ve seen dozens of teams stuck there. We've been there ourselves.
But there’s good news: many data-driven teams around the world are using a data workflow and analyzes platform, Intersect, to
- Easily automate the sourcing and formatting of data and
- Automatically update and share them with their team
Now that you've got the 3 steps to creating great charts, test drive on a platform that helps make it scalable! Book a call today!