On a page of dull statistics, there is eye-captivating power in a dynamic data visualization. You may be familiar with the increasingly popular “bar chart race” visualization. In addition to their visual appeal, bar chart races are great for revealing interesting data insights over time.
Since their surge in popularity in 2019, people and businesses have been using these dynamic visualizations to present the historical development of various types of information.
These motion charts have been used to visualize rank and revenue trends of the world’s largest companies, illustrate the most popular websites over time, the world population through the years, or the spike in Covid-19 cases.
To create a motion chart like this one, a data scientist needs to:
- Acquire data from sources like Kaggle or Our World with Data
- Clean and format the dataset so that is easier to visualize
- Use Python packages to apply the appropriate visualization
This short, but involved checklist may seem overwhelming, however, there is an easier way! By using two, no-code tools for data processing and visualization, you can create insightful bar chart races (regardless of your programming background)!
Use this step-by-step tutorial to find out how.
The data we’re going to use
For this motion chart, I plan to visualize the total number of vaccinations administered per 100 people between December 28th, 2020, and July 28th, 2021. This chart will be limited to the top 15 countries with the highest number of administered vaccines per 100 people.
With this goal in mind, I need to extract three values of interest from the dataset:
- The name of the country
- The total number of administered vaccines per 100 people
- The month when the vaccines were administered
The raw dataset requires some cleaning and re-organizing before I can create the visualization. The next section contains a breakdown of this cleaning process.
First, let’s check out the technology stack we will be using.
This article covers how to effortlessly clean, organize and visualize your data without writing a single line of code.
To do this, you’ll need two no-code tools:
- a data cleaning tool
- a bar-chart-race visualization tool
To fulfill these two needs, we’ll be using:
- Intersect — This is a collaborative data workspace that lets you clean, organize and analyze your data in data notebooks, without coding.
- Flourish — This tool has a pre-built template that we can use to create the actual bar-chart-race visualization.
Re-organizing the data using Intersect
Intersect is a no-code version of a Jupyter notebook, where a user can assemble data workflows (or “apps”) using building blocks. Using these notebooks, we can create a data workflow that is going to clean and re-organize the original Covid-19 data, so that the final format is ready for visualization.
After generating a new notebook in Intersect, we start by importing our data as a CSV file.
Within seconds, the following dataset is imported:
In order to clean and organize the dataset, Intersect provides the following building blocks:
- Import data (to import from Excel, Google Sheets, databases, other apps)
- Work with data (to analyze and transform your data)
- Data Visualizations (to create charts and dashboards)
- Notes (to add notes and comments)
- Machine Learning (for predictive analytics)
- Export data (to write back to Google Sheets, databases, other apps)
Following are some of the blocks I used to clean and re-organize the Covid-19 dataset.
Select subset of columns
Since the goal with this data expiration is to visualize the number of vaccinations administered per 100 people, we only need to work with three columns. The "Select subset of columns" block allows us to choose just the columns we need, in this case, the location, number of vaccines per 100 people, and the date.
Now, the resulting table has only three columns.
Extract the month in separate column
Due to the vaccine’s somewhat recent release, I chose to portray the total vaccinations per month in 100-person increments. To achieve this month-by-month breakdown, we must extract the month from our "date" column. Additionally, we extract the date, since we'll be filtering the table based on it in a later step.
To do this we select the “Extract from date/time column” block and specify the following parameters (see image below).
Now we have a column that represents the month and day in which the vaccines were administered.
I then filter the vaccines given per 100 people on the 28th day of every month. This allows me to compare fixed dates for each of the months. To do that, I’ve used the “Filter data” block, as shown below.
The final table now only shows the total number of administered vaccines per 100 people at the 28th of each month.
Finally, we need to rearrange our table to meet Flourish’s template requirements. We use the “Pivot table” block and summarize our data by filling in the input fields in the block as shown in the image below.
Notice how this block summarized the same data in a different layout. Each of the columns represents a month, the rows have unique location elements and the values in the columns represent the sums of total vaccinations per hundred people for each month.
Rename the columns
Lastly, we aesthetically enhance the cleaned table by adding descriptive column names using the “Rename columns” block.
At this point, we have our final table, and we are ready to download the dataset to use in Flourish.
Reusing the data app
Before moving on to our final bar-chart race steps, I want to note that the workflow we just set up can serve as a backbone for future data cleaning of the same Covid-19 vaccination dataset.
Due to this functionality, this data app allows you to directly upload the original Covid-19 vaccinations data and it will automatically clean the dataset, by repeating the steps that we’ve outlined in the previous sections.
Designing the Bar-chart-race with Flourish
Now, our data is ready to be visualized in Flourish.
Flourish gives you the power to turn your spreadsheets into various interactive visualizations.
For this part, we will be using a pre-built bar-chart-race template which is located in Flourish’s template gallery. Clicking the link above should navigate you to “Create visualization” where you will be transferred to Flourish’s editor. At this point, you can create a public template, and use the editor for free.
Next, navigate to the “Data” toggle to access the screen where you can import your data.
On this page, click on the blue “Upload data” button. This will prompt you to select a CSV file from your computer.
After you upload your data, Flourish is going to ask if you’d like to publish your data privately or publicly. As long as you agree that your data can be publicly visible, Flourish will allow you to use the platform for free.
Once your data is uploaded, you may edit your labels, values, and categories. In our case, our Label corresponds to Column A, and our Values are stored in the columns between Column B and Column ZZ.
Finally, we are ready to style the bar chart.
Styling the bar chart
In “Preview” mode, you’ll be given a variety of options to customize and style your chart. Among many other customizations, Flourish allows you to add titles, adjust the bar colors, change the overall layout, and add sources.
With a couple of tweaks to our bar colors, titles, and footer sources, we end up with this final bar-chart-race:
We have published this bar chart race using Flourish publishing options and you can find it on this link.
As a data-driven marketer, I always find myself exploring various data tools to help me use data to drive important decisions or just-for-fun exploration!
This tutorial was designed to educate non-technical users about their options when it comes to data munging and creating visualizations.
I hope that you found this tutorial useful and that you’d use some of the learnings in your day-to-day work.
If you have any questions about the process, feel free to ping me on Twitter and I will be happy to help you.