When is the golden age of rap music?

1. Introduction

I have a rap-enthusiast friend whom I have known for a long time. We became friends when we were still in kindergarten. Now he’s operating his own vintage outfit store. Every single time I visit him, he will definitely play some old school rap music in his store. Under his influence, I have get myself familiar with the names of those OG rappers such as Biggie, Tupac, and NWA.

Gradually, I notice that he seldom plays the “new style” rap songs such as mumble-rap and trap. One time I asked him why so, and he said “I just love the old school rap. They are much realer if you know what I mean.” Well, to be honest, I don’t, and that’s exactly why I’m writing this post to analyze the rap musics of different time. Hopefully, I can find out the golden age of rap music.

2. Data analysis and visualization

2.1 Analysis on the poll datatset

library(tidyverse)
library(hrbrthemes)
library(gganimate)
library(plotly)
polls <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-04-14/polls.csv')
rankings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-04-14/rankings.csv')

After importing the data, we first focus on the poll data since its sample size is larger. With the poll data, because the only variable that reflects the quality of these songs is rank, we will calculate the average ranking of the nominated songs for each year to show how the quality varies across time.

Based on this graph and the help of the regression line, we see that the trend for the average numerical value of the ranking is actually increasing over time. That is to say, the quality, on the contrary, is decreasing. In particular, there are three years that have record high rankings of all time in the 80s. In this way, we can infer that the golden age of rap music could be the 80s.

polls%>%
  group_by(year)%>%
  summarise(av_rank = mean(rank))%>%
  ggplot(aes(x = year, y = av_rank),group = 1)+
  geom_point(color = ft_cols$yellow)+
  geom_line(color = ft_cols$white)+
  geom_smooth(color = ft_cols$white)+
  labs(x = "Year",
       y = "Average ranking",
       title = "How the average ranking changes across time?",
       subtitle = "The regression result shows that the average ranking is increasing,
which means the quality of the rap musics decreases over time."
       )+ 
  theme_ft_rc()
## `summarise()` ungrouping output (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Then we take a look at how the rating changes for both genders. First thing we notice is that there are more points for songs that are created by male rappers and female rappers, which means songs created by male rappers are more likely to get nominated. In addition, the numerical values of average ranking of songs created by male rappers are mostly lower than those of female rappers. These two phenomena convey the information that the rap music industry is dominated by male rappers and the systematic gender gap does exist.

polls%>%
  filter(gender!="mixed")%>%
  group_by(year,gender)%>%
  summarise(av_rank = mean(rank))%>%
  ggplot(aes(x = year, y = av_rank,color = gender))+
  geom_point()+
  geom_line()+
  labs(x = "Year",
       y = "Average ranking",
       title = "How the average ranking changes for different genders?",
       color = "",
       subtitle = "From this graph, we see that the rap industry is predominatedly occupied by male rappers."
       )+
  geom_smooth()+
  facet_wrap(~gender,scales = "free_x")+
  scale_color_manual(values =  c("yellow","white"))+ 
  theme_ft_rc()+
  theme(legend.position = "null")
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The gender gap could be further illustrated by the following graph. Here, we have picked out the top five artists of each gender with the highest number of nominations. Even though Missy Elliott takes the first place of female rappers with her songs nominated by 9 times, she is still way out-numbered by the fifth place male rapper OutKast with his songs nominated by 17 times. Not to memtion the fact that Salt-N-Pepa and Cardi B are both nominated only twice.

polls%>%
  filter(gender!="mixed")%>%
  group_by(gender)%>%
  count(artist,sort=TRUE)%>%
  top_n(5)%>%
  ggplot(aes(x = n, y = reorder(artist,n), color = gender))+
  geom_point(size = 4)+
  geom_segment(aes(x = 0,xend = n, y = artist, yend = artist),size = 3, alpha = 0.5)+
  geom_text(aes(label = n),hjust = 2)+
  scale_color_manual(values =  c("yellow","white"))+
  labs(x = "Nomination number",
       y = "Artists",
       title = "The top 5 artists with\n most nomination for each gender")+
  theme_ft_rc()
## Selecting by n

Next, by taking the average ranking of the songs nominated for each artist, we have the following rough list of the top 15 rap artists with best quality of works.

library(ggrepel)
top_artists_ranking<-polls%>%
  add_count(artist)%>%filter(n>7)%>%
  group_by(artist)%>%
  summarise(av_ranking = mean(rank))%>%
  ungroup()%>%
  top_n(15,-av_ranking)
## `summarise()` ungrouping output (override with `.groups` argument)
polls%>%
  inner_join(top_artists_ranking)%>%
  add_count(title)%>%
  mutate(title_for_graph = ifelse(n>=10,title,""))%>%
  ggplot(aes(x = year, y = reorder(artist,-av_ranking)))+
  geom_point(aes(size = n),color = ft_cols$yellow,alpha = 0.15)+
  geom_text(aes(label = title_for_graph),
            color = "white",
            hjust = -0.1)+ 
  labs(x = "Year", y = "",
       title = "The greatest 15 rappers of all time")+ 
  theme_ft_rc()+
  theme(legend.position = "null")
## Joining, by = "artist"

The top three rap artists are Public Enemy, The Notorious B.I.G (with no doubt), and Eric B & Rakim. Since the other rappers on the list are also quite famous, I would assume this ranking is valid. In the graph, I have also annotate the names of songs that are nominated more than 10 times. Thus, based on the poll dataset, the greatest songs are “Fight The Power”, “Juicy”, “Shook Ones (Part II)”, “The Meessage”, “C.R.E.A.M.”, and “Nuthin’ But A ‘G’ Thang”.

There is another thing in the graph which is worthwhile to point out. Notice the points with large size are clustering in the 1980-1990 time period. This could be seen as another reference to the golden age of rap music.

2.2 Analysis on the rankings datatset

rankings%>%
  filter(ID!=70)%>%
  ggplot(aes(x = year, y = points, color = gender))+
  geom_point(alpha = 0.6)+
  scale_y_log10()+geom_smooth(color = "white")+
  labs(title = "How does quality of songs change through time?")+
  scale_color_manual(values = c("indianred1","yellow","white"))+ 
  theme_ft_rc()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

rankings%>%
  group_by(year)%>%
  summarise(av_points = mean(points))%>%
  ggplot(aes(x = year, y = av_points),group = 1)+
  geom_point(color = ft_cols$yellow)+
  geom_line(color = ft_cols$white)+
  labs(x = "Year",
       y = "Average ranking",
       title = "How does the average points change?")+ 
  theme_ft_rc()
## `summarise()` ungrouping output (override with `.groups` argument)

top_ranking_gender<-rankings%>%
  distinct(ID,year,title, artist, gender, points)%>%
  group_by(gender)%>%
  top_n(10,points)%>%
  ungroup()
top_ranking_gender%>%
  ggplot(aes(x = year, y = points, color = gender))+
  geom_point(aes(size = points*500))+
  scale_color_manual(values = c("indianred1","yellow","white"))+
  labs(title="Points distribution of top 10 songs \nperformed by artists of each gender ")+
  theme_ft_rc()+
  scale_y_log10()+
  scale_size_continuous(guide = FALSE)

top_ranking_gender%>%
  ggplot(aes(x = year, y = points, color = gender))+
  scale_color_manual(values = c("indianred1","yellow","white"))+
  geom_text_repel(aes(label = title), force = 20)+
  labs(title="The top 10 songs performed by artists of each gender ")+
  theme_ft_rc()+
  scale_y_log10()

library(tidytext)

rankings%>%
  distinct(title, year)%>%
  unnest_tokens(word, title)%>%
  anti_join(stop_words)%>%
  inner_join(get_sentiments("afinn"))%>%
  group_by(year)%>%
  summarise(av_sent = mean(value, na.rm = TRUE))%>%
  ggplot(aes(x = year, y = av_sent), group=1)+
  geom_point(color = "yellow")+geom_line(color = "white")+
  geom_smooth(se=FALSE, color = "white", alpha = 0.6)+
  labs(y = "Average sentiment",
       title = "How does the average sentiment for the titles change?")+
  theme_ft_rc()
## Joining, by = "word"
## Joining, by = "word"
## `summarise()` ungrouping output (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Xuxin Zhang
Xuxin Zhang

Just a wondering village boy.

Related