How to draw a racing bar plot?

1. Introduction

Nowadays, more and more data analysis reports begin to include a dynamic racing bar plot in order to show how the values for a type of categorical variable change over time. In this tutorial, we will explore how to construct this popular visualization technique.

2. Basic strucutres

library(tidyverse)
library(extrafont)
library(gganimate)
library(scales)
theme_set(theme_light())
kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')

First, make sure you have the ranking for each country in every year since we will begin by plotting a stacked bar plot with ranking on x-axis and the total spending on y-axis. (More generally, to make the animated bar plot, make sure the variable on y axis is changing overtime for each category such as country in our case.) Also, to simplify the GIF, we only focus on the top 15 countries with the most spending for every year.

What the following code does is to first filter out other variables since we only focus on the K12 education spending only. Then we need to multiply the inf_adj_perchild by 1000 because the numbers are calculated in $1000. Then the ranking column is created.

kids_clean<-kids%>%filter(variable == "PK12ed")%>%
  transmute(state, year, inf_adj_perchild=round(inf_adj_perchild,3)*1000)%>%
  group_by(year)%>%
  arrange(desc(inf_adj_perchild))%>%
  mutate(ranking = 1:n())%>%filter(ranking <=15)

Next, create a stacked bar plot as you would usually do. Normally we would facet_wrap(~year) so that we can get a graph for every year, but to plot the race bar plot, we will stack the bars for every year on a single graph without the faceting.

p<-kids_clean%>%ggplot(aes(x = ranking, y = inf_adj_perchild, fill = state))+
  geom_col(show.legend = FALSE, alpha = 0.6)

p

p1<-p+geom_text(aes(y = 0, label = state), family = "Times")+
  geom_text(aes(y = inf_adj_perchild, label = as.character(dollar(inf_adj_perchild))))

p1

After running the code above, you will notice that the texts for the value of inf_adj_perchild don’t appear in the position we want them to be. However, if we facet_wrap() by year, these texts do appear in the right place

p1+facet_wrap(~year)

We then add the Year onto our graph so that the reader could know which year the graph is talking about.

p2<-p1+geom_text(aes(label = as.character(year)),x = -15, y = 18000, size = 16, family = "Times")

p2

Notice the position we assigned to the years: we put them in the second quandrant with a high value of y coordinate which is slightly smaller than the maximal value for inf_adj_perchild of all time. Such position would be at the right-bottom corner of the graph after flipping the coordinate and reversing the scales of x-axis. If we run the code above, one might wonder where are the years, I can’t see it. The reason is that since the default range for x axis is positive, we need to expand_limits() to x =-30. After running the code below, you will be able to see the years.

p2+facet_wrap(~year)+expand_limits(x = -30)

To make the plot more appealing, we need to flip the coordinate. The clip parameter controls if the text would be clipped by the axises. Just take a look at the difference between the following two graphs.

p2+coord_flip(clip = "on")+expand_limits(x = -30)

p2+coord_flip(clip = "off")+expand_limits(x = -30)

Notice when clip is set to be “on”, the country names actually get cropped, so we need to set clip to be off. In addition, we need to reverse the scale of x since the graph right now needs to be turned up side down.

p3<-p2+coord_flip(clip = "off")+scale_x_reverse()+ylim(0,20000) 
p3
## Warning: Removed 274 rows containing missing values (geom_col).

To be honest, if you don’t really care about the aesthetics of the visualization and just want to get a race bar plot, we are actually almost done. All you need to do is to apply the transition_states() function on p3. The function comes from the gganimate package, and the result looks like this.

p3+labs(y = "Inflation-adjusted spending on K12 education",
        x = "",
        title = "The top 15 states with highest inflation-adjusted spending on K12 education in..")+
  transition_states(year)

If you have read till this part, then congratualations! You already know the basic steps to draw a racing bar plot. Let me review the key steps for you:

  1. Create a ranking column of the categorical variable that you focus on for each year (usually the time unit is a year)
  2. Draw a stacked bar plot with ranking on x-axis and the numerical variable you are interested in on the y-axix
  3. Annotate the name for the categorical variables and the values of the numerical variables.
  4. Annotate the years in the second quadrant (a slightly negative x coordinate and a positive y coordinate with value close to the maximum of the numerical variable you have)
  5. Flip the coordinate (with on = off) and scale_x_reverse()
  6. Adjust the range for the y axis using ylim().
  7. Apply function transition_states(year).

3. Asethetic adjustment

In the rest of this tutorial, we will work on the asethetics of our visualization by setting up the theme. For example, we could make the labels of x-axis and y-axis disappear by setting axis.text.x = element_blank() and axis.text.y = element_blank(). Since we have already annoate the countries, we no longer need the title for that axis. However, even though we know the categorical variables lie on the x axis, in order to adjust it in the theme, we actually use axis.title.y = element_blank() since the flipped x-axis is recognized as y-axis by the computer. Finally, just remember to edit the title and subtitle for the graph, and boom! You will see a nice racing bar plot on your screen. To change the size of the graph and the frames (which controls how fast the animation plays), you can use the animate() function.

p4<-p3+
  coord_flip(clip = "off", expand = TRUE) +
  scale_x_reverse() +
  ylim(-10,19000) + 
  theme_minimal() +
  theme(legend.position = "none",
        plot.margin = margin(30, 30, 30, 30),
        plot.title = element_text(size = 20, family = "Times", face = "bold", margin = margin(20,0,5,0)),
        plot.subtitle = element_text(size = 14, family = "Times", margin = margin(0,0,15,0)),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        axis.title.x = element_text(size = 12, family = "Times"))  +
  labs(y = "Spending per individual",
       title = "US Spending on K12 Education",
       subtitle = "Top 15 States in...") +
  transition_states(year)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
animate(p4, nframes = 130, width = 600, height = 900)

Xuxin Zhang
Xuxin Zhang

Just a wondering village boy.

Related