How to draw a racing bar plot


Date
Sep 20, 2020 12:00 AM

Talke about how asfdosdfoijsdf s dfsdfasdf sdfsf sdfasdfd

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.0.2
## Warning: package 'tibble' was built under R version 4.0.2
library(extrafont)
## Warning: package 'extrafont' was built under R version 4.0.2
library(gganimate)
## Warning: package 'gganimate' was built under R version 4.0.2
library(scales)
theme_set(theme_light())
kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')

First, make sure you have the ranking for each country in every year since we will begin by plotting a stacked bar plot with ranking on x-axis and the total spending on y-axis. (More generally, to make the animated bar plot, make sure the variable on y axis is changing overtime for each category such as country in our case.) Also, to simplify the GIF, we only focus on the top 15 countries with the most spending for every year.

What the following code does is to first filter out other variables since we only focus on the K12 education spending only. Then we need to multiply the inf_adj_perchild by 1000 because the numbers are calculated in $1000. Then the ranking column is created.

kids_clean<-kids%>%filter(variable == "PK12ed")%>%
  transmute(state, year, inf_adj_perchild=round(inf_adj_perchild,3)*1000)%>%
  group_by(year)%>%
  arrange(desc(inf_adj_perchild))%>%
  mutate(ranking = 1:n())%>%filter(ranking <=15)

Next, create a stacked bar plot as you would usually do. Normally we would facet_wrap(~year) so that we can get a graph for every year, but to plot the race bar plot, we will stack the bars for every year on a single graph without the faceting.

p<-kids_clean%>%ggplot(aes(x = ranking, y = inf_adj_perchild, fill = state))+
  geom_col(show.legend = FALSE, alpha = 0.6)

p

p1<-p+geom_text(aes(y = 0, label = state), family = "Times")+
  geom_text(aes(y = inf_adj_perchild, label = as.character(dollar(inf_adj_perchild))))

p1

After running the code above, you will notice that the texts for the value of inf_adj_perchild don’t appear in the position we want them to be. However, if we facet_wrap() by year, these texts do appear in the right place

p1+facet_wrap(~year)

We then add the Year onto our graph so that the reader could know which year the graph is talking about.

p2<-p1+geom_text(aes(label = as.character(year)),x = -15, y = 18000, size = 16, family = "Times")

p2

Notice the position we assigned to the years: we put them in the second quandrant with a high value of y coordinate which is slightly smaller than the maximal value for inf_adj_perchild of all time. Such position would be at the right-bottom corner of the graph after flipping the coordinate and reversing the scales of x-axis. If we run the code above, one might wonder where are the years, I can’t see it. The reason is that since the default range for x axis is positive, we need to expand_limits() to x =-30. After running the code below, you will be able to see the years.

p2+facet_wrap(~year)+expand_limits(x = -30)

To make the plot more appealing, we need to flip the coordinate. The clip parameter controls if the text would be clipped by the axises. Just take a look at the difference between the following two graphs.

p2+coord_flip(clip = "on")+expand_limits(x = -30)

p2+coord_flip(clip = "off")+expand_limits(x = -30)

Notice when clip is set to be “on”, the country names actually get cropped, so we need to set clip to be off. In addition, we need to reverse the scale of x since the graph right now needs to be turned up side down.

p3<-p2+coord_flip(clip = "off")+scale_x_reverse()+ylim(0,20000) 
p3
## Warning: Removed 274 rows containing missing values (geom_col).

To be honest, if you don’t really care about the aesthetics of the visualization and just want to get a race bar plot, we are actually almost done. All you need to do is to apply the transition_states() function on p3. The function comes from the gganimate package, and the result looks like this.

p3+labs(y = "Inflation-adjusted spending on K12 education",
        x = "",
        title = "The top 15 states with highest inflation-adjusted spending on K12 education in..")+
  transition_states(year)

If you have read till this part, then congratualations! You already know the basic steps to draw a racing bar plot. Let me review the key steps for you:

  1. Create a ranking column of the categorical variable that you focus on for each year (usually the time unit is a year)
  2. Draw a stacked bar plot with ranking on x-axis and the numerical variable you are interested in on the y-axix
  3. Annotate the name for the categorical variables and the values of the numerical variables.
  4. Annotate the years in the second quadrant (a slightly negative x coordinate and a positive y coordinate with value close to the maximum of the numerical variable you have)
  5. Flip the coordinate (with on = off) and scale_x_reverse()
  6. Adjust the range for the y axis using ylim().
  7. Apply function transition_states(year).

In the rest of this tutorial, we will work on the asethetics of our visualization by setting up the

Xuxin Zhang
Xuxin Zhang

Just a wondering village boy.

Related