What's in your cocktails?

1. Introduction

Several days ago, I went to a bar to celebrate one of my friends’ birthday. We ordered some cocktails. However, when my drink was served, since I am not an alcoholic person, I couldn’t really recognize what was in my margarita after a few sippings, which makes me wonder if I could develop some interactive web application so that people would know what are the ingredients added to their booze.

This project explore the recipes of cocktails. The main focus would be put on analyzing ingredients of different types of cocktails. At the end of the project, we wil create an interactive bar plot to show the composition of certain type of cocktail drinks.

2. Data analysis

First, we need to import the two datasets we are going to use.

library(tidyverse)
library(DT)
theme_set(theme_light())
cocktails <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/cocktails.csv')
boston_cocktails <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv')

After importing the data, I wonder what is the difference between these two datasets, so I calculate the number of drinks incorporated and find out that the boston_cocktails contains more type of drinks. In addition, notice that the boston_cocktails has a more uniform units for the measure column, which would facilitate our construction of the interactive bar plot. Thus, we will stick with the boston datasets for the rest of the project.

library(stringr)
library(knitr)

cocktails%>%distinct(drink)
## # A tibble: 546 x 1
##    drink                               
##    <chr>                               
##  1 '57 Chevy with a White License Plate
##  2 1-900-FUK-MEUP                      
##  3 110 in the shade                    
##  4 151 Florida Bushwacker              
##  5 155 Belmont                         
##  6 24k nightmare                       
##  7 252                                 
##  8 3 Wise Men                          
##  9 3-Mile Long Island Iced Tea         
## 10 410 Gone                            
## # … with 536 more rows
kable(head(cocktails,20))
row_iddrinkdate_modifiedid_drinkalcoholiccategorydrink_thumbglassibavideoingredient_numberingredientmeasure
0’57 Chevy with a White License Plate2016-07-18 22:49:0414029AlcoholicCocktailhttp://www.thecocktaildb.com/images/media/drink/qyyvtu1468878544.jpgHighball glassNANA1Creme de Cacao1 oz white
0’57 Chevy with a White License Plate2016-07-18 22:49:0414029AlcoholicCocktailhttp://www.thecocktaildb.com/images/media/drink/qyyvtu1468878544.jpgHighball glassNANA2Vodka1 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA1Absolut Kurant1/2 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA2Grand Marnier1/4 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA3Chambord raspberry liqueur1/4 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA4Midori melon liqueur1/4 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA5Malibu rum1/4 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA6Amaretto1/4 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA7Cranberry juice1/2 oz
11-900-FUK-MEUP2016-07-18 22:27:0415395AlcoholicShothttp://www.thecocktaildb.com/images/media/drink/uxywyw1468877224.jpgOld-fashioned glassNANA8Pineapple juice1/4 oz
2110 in the shade2016-02-03 14:51:5715423AlcoholicBeerhttp://www.thecocktaildb.com/images/media/drink/xxyywq1454511117.jpgBeer GlassNANA1Lager16 oz
2110 in the shade2016-02-03 14:51:5715423AlcoholicBeerhttp://www.thecocktaildb.com/images/media/drink/xxyywq1454511117.jpgBeer GlassNANA2Tequila1.5 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA1Malibu rum1/2 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA2Light rum1/2 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA3151 proof rum1/2 oz Bacardi
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA4Dark Creme de Cacao1 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA5Cointreau1 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA6Milk3 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA7Coconut liqueur1 oz
3151 Florida Bushwacker2016-07-18 22:28:4314588AlcoholicMilk / Float / Shakehttp://www.thecocktaildb.com/images/media/drink/rvwrvv1468877323.jpgBeer mugNANA8Vanilla ice-cream1 cup
boston_cocktails%>%distinct(name)
## # A tibble: 989 x 1
##    name                
##    <chr>               
##  1 Gauguin             
##  2 Fort Lauderdale     
##  3 Apple Pie           
##  4 Cuban Cocktail No. 1
##  5 Cool Carlos         
##  6 John Collins        
##  7 Cherry Rum          
##  8 Casa Blanca         
##  9 Caribbean Champagne 
## 10 Amber Amour         
## # … with 979 more rows
kable(head(boston_cocktails,20))
namecategoryrow_idingredient_numberingredientmeasure
GauguinCocktail Classics11Light Rum2 oz
GauguinCocktail Classics12Passion Fruit Syrup1 oz
GauguinCocktail Classics13Lemon Juice1 oz
GauguinCocktail Classics14Lime Juice1 oz
Fort LauderdaleCocktail Classics21Light Rum1 1/2 oz
Fort LauderdaleCocktail Classics22Sweet Vermouth1/2 oz
Fort LauderdaleCocktail Classics23Juice of Orange1/4 oz
Fort LauderdaleCocktail Classics24Juice of a Lime1/4 oz
Apple PieCordials and Liqueurs31Apple schnapps3 oz
Apple PieCordials and Liqueurs32Cinnamon schnapps1 oz
Cuban Cocktail No. 1Cocktail Classics41Juice of a Lime1/2 oz
Cuban Cocktail No. 1Cocktail Classics42Powdered Sugar1/2 oz
Cuban Cocktail No. 1Cocktail Classics43Light Rum2 oz
Cool CarlosCocktail Classics51Dark rum1 1/2 oz
Cool CarlosCocktail Classics52Cranberry Juice2 oz
Cool CarlosCocktail Classics53Pineapple Juice2 oz
Cool CarlosCocktail Classics54Orange curacao1 oz
Cool CarlosCocktail Classics55Sour Mix1 oz
John CollinsWhiskies61Bourbon whiskey2 oz
John CollinsWhiskies62Fresh lemon juice1 oz

Next, we will find out the ingredients used the most in the recipes. In this way, even though you may have no idea about what’s in your cocktails, chances are that if you pick one of most frequently used ingredients listed below, you may get the correct answer (and you can show off a little bit in front of your friends). The following is a list of the top 10 ingredients used to create cocktails.

boston_cocktails%>%count(ingredient, sort = TRUE)%>%head(10)%>%
  ggplot(aes(x = reorder(ingredient,n), y = n))+
  geom_point(aes(size = n), alpha = 0.2)+
  geom_segment(aes(x=reorder(ingredient,n),
                   xend = reorder(ingredient,n), 
                   y = 0, yend = n,size = n), alpha = 0.2)+
  ylim(0,180)+theme_minimal()+
  theme(
        axis.text.x = element_text(vjust = 0.8),
        axis.text.y = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "null")+
  labs(x = "Ingredient", y = "Count",
       title = "Top 10 ingredients used in cocktail recipes")+
  coord_polar(clip = "off")

We can visualize in another way by drawing a word cloud plot.

From these two graphs, we can see that the top ingredients are gin, fresh lemon juice,simple syrup, and vodka.

I also wonder which category requires the most ingredients on average. We start by finding out how many types of cocktails are there. Based on the list below, we can see there are altogether 11 types of drinks.

boston_cocktails%>%count(category,sort = TRUE)%>%kable()
categoryn
Cocktail Classics1560
Vodka545
Whiskies457
Rum - Daiquiris437
Tequila371
Brandy169
Gin67
Cordials and Liqueurs24
Shooters6
Rum4
Non-alcoholic Drinks3

Then we summarize the average number of ingredients for each type. We can take a look at which cocktail has the most ingredients. As the table shows, the greatest number of ingredients is 6, and there are quite a lot of drinks which are made from 6 ingredients.

number_ingredient<-boston_cocktails%>%group_by(category,name)%>%summarise(num_ing = max(ingredient_number))
## `summarise()` regrouping output by 'category' (override with `.groups` argument)
kable(head(number_ingredient%>%arrange(desc(num_ing)),20))
categorynamenum_ing
BrandyCherry Blossom6
BrandyDeauville Cocktail6
Cocktail ClassicsApplejack Punch6
Cocktail ClassicsBetsy Ross Cocktail6
Cocktail ClassicsEye-Opener6
Cocktail ClassicsFrankenjack Cocktail6
Cocktail ClassicsGloom Lifter6
Cocktail ClassicsGreen Hornet (Dry)6
Cocktail ClassicsHyatt’s Jamaican Banana6
Cocktail ClassicsNew Orleans Gin Fizz6
Cocktail ClassicsPrairie Oyster Cocktail6
Cocktail ClassicsRamos Fizz6
Cocktail ClassicsRed Swizzle6
Cocktail ClassicsSand-Martini Cocktail6
Cocktail ClassicsSidecar6
Cocktail ClassicsTahitian Tea6
GinThe Winkle6
GinVow Of Silence6
Rum - DaiquirisFog Cutter6
Rum - DaiquirisHai Karate6

After knowing this, we can draw boxplot to show the distribution of number of ingredients for each type.

number_ingredient%>%
  ggplot(aes(x = num_ing, y = reorder(category, num_ing)))+
  geom_boxplot()+
  labs(x = "Number of ingredients", 
       y ="Category of cocktails", 
       title = "The distribution of number of ingredients for different cocktail categories ")+
  xlim(0,7)

Based on the graph above, we konw that tequila cocktails and whiskies cocktails have the highest average number of incredients being used.

After knowing these rudimentary facts about cocktails, we are ready to build the interactive web application.

3. Constructing Shiny web application

Our goal is to create an interactive application that shows you which ingredients are added to your cocktail as well as the corresponding proportions. In order to do so, we need to first make the measure column to be uniform since right now that column has both integers and fraction. What we need to do is to convert the fractions into integers.

My method is to first use separate_rows() to separate the entries with composite fraction format such as “1 1/2” into a digit part (1) and fraction part(1/2). Then, I use str_detect() to distinguish between integers and fraction. Next, I count how many types of fractions are there. Luckily, we only have 5 types of fractions, so we can use the case_when() to change the 5 types of fractions into the digit format. After finishing these steps, we can unify the measure with ease, and the following is our final result.

# find out how many types of fractions are there
boston_cocktails%>%
  mutate(measure = str_remove(measure, "oz"))%>%
  separate_rows(measure, sep = " ")%>%
  mutate(frac = str_detect(measure, "/"))%>%
  filter(frac==TRUE)%>%count(measure)
## # A tibble: 5 x 2
##   measure     n
##   <chr>   <int>
## 1 1/2      1283
## 2 1/3         3
## 3 1/4       236
## 4 2/3         3
## 5 3/4       425
# convert the limited types of fraction into digits.
boston_cocktails_clean<-boston_cocktails%>%
  mutate(measure = str_remove(measure, "oz"))%>%
  separate_rows(measure, sep = " ")%>%
  mutate(measure= case_when(measure == "1/2"~0.5,
                            measure == "1/3"~round(1/3,3),
                            measure == "1/4"~0.25,
                            measure == "2/3"~round(2/3,3),
                            measure == "3/4"~0.75,
                            TRUE~as.numeric(measure)))%>%
  filter(!is.na(measure))%>%
  group_by(name,category,ingredient_number,ingredient)%>%
  summarise(measure = sum(measure))
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
## `summarise()` regrouping output by 'name', 'category', 'ingredient_number' (override with `.groups` argument)
boston_cocktails_clean
## # A tibble: 3,622 x 5
## # Groups:   name, category, ingredient_number [3,622]
##    name         category         ingredient_numb… ingredient             measure
##    <chr>        <chr>                       <dbl> <chr>                    <dbl>
##  1 1626         Whiskies                        1 Bourbon whiskey           2.5 
##  2 1626         Whiskies                        2 Gingerbread liqueur       0.75
##  3 1626         Whiskies                        3 cherry-flavored brandy    0.5 
##  4 1626         Whiskies                        4 Angostura Bitters         2   
##  5 1626         Whiskies                        5 Italian preserved che…    1   
##  6 19th Century Whiskies                        1 Bourbon whiskey           1.5 
##  7 19th Century Whiskies                        2 Fresh lemon juice         0.75
##  8 19th Century Whiskies                        3 White creme de cacao      0.75
##  9 19th Century Whiskies                        4 Lillet Rouge              0.75
## 10 A. J.        Cocktail Classi…                1 Applejack                 1.5 
## # … with 3,612 more rows

Our next step is to create a Shiny app. The following is the code I wrote to deploy the interactive bar plot. And the final result is shown under this chunk of code. With the help of such app, you could finally tell your friends confidently what is in your cocktails.

library(shiny)
library(tidyverse)
library(plotly)
theme_set(theme_light())
cocktails <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/cocktails.csv')
boston_cocktails <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/boston_cocktails.csv')

boston_cocktails_clean<-boston_cocktails%>%
    mutate(measure = str_remove(measure, "oz"))%>%
    separate_rows(measure, sep = " ")%>%
    mutate(measure= case_when(measure == "1/2"~0.5,
                              measure == "1/3"~round(1/3,3),
                              measure == "1/4"~0.25,
                              measure == "2/3"~round(2/3,3),
                              measure == "3/4"~0.75,
                              TRUE~as.numeric(measure)))%>%
    filter(!is.na(measure))%>%
    group_by(name,category,ingredient_number,ingredient)%>%
    summarise(measure = sum(measure))

# Define UI for application that draws a histogram
ui <- fluidPage(

    # Application title
    titlePanel("Interactive bar plot:"),

    # Sidebar with a slider input for number of bins 
    sidebarLayout(
        sidebarPanel(
            selectInput("name",label = "Select the cocktail name",
                        choices = unique(boston_cocktails_clean$name),
                        selected  = c("Mojito","Martini"),
                        selectize = TRUE,
                        multiple = TRUE)
        ),

        # Show a plot of the generated distribution
        mainPanel(
           plotOutput("distPlot"),height = 250, width = 400
        )
    )
)

# Define server logic required to draw a histogram
server <- function(input, output) {

    output$distPlot <- renderPlot({
        boston_cocktails_clean%>%
            filter(name%in%input$name)%>%
            ggplot(aes(x = ingredient, y = measure, fill = name))+
            geom_col(position = "dodge", alpha = 0.7)+
            labs(x = "Ingredient", y ="Oz",
                 title = "The ingredient composition for...",
                 fill = "Name of the cocktails")+
            theme(plot.margin = margin(30, 30, 30, 30),
                  plot.title = element_text(size = 20, family = "Times", 
                                            face = "bold", margin = margin(20,0,5,0)),
                  axis.title.x = element_text(size = 12, family = "Times"),
                  axis.title.y = element_text(size = 12, family = "Times"),
                  axis.text.x = element_text(size = 12,family = "Times", angle = 45))
    })
}

# Run the application 
shinyApp(ui = ui, server = server)
Xuxin Zhang
Xuxin Zhang

Just a wondering village boy.

Related