runconf17, an analysis of emoji use

rOpenSci

rstats

conferences

emojis

I had such a delightful time at rOpenSci’s unconference. Not only was it extremely productive (21 packages were produced!), but in between the crazy productivity was some epic community building.

Author

Lucy D’Agostino McGowan

Published

June 4, 2017

I had such a delightful time at rOpenSci’s unconference last week.

21 📦 were produced!

Not only was it extremely productive, but in between the crazy productivity was some epic community building.

for the record, I’m an #rchickenlady, IT’S HAPPENING

Stefanie kicked the conference off with ice breakers, where we explored topics ranging from #rcatladies & #rdogfellas to impostor syndrome. It was an excellent way to get conversations starting!

work

Karthik and I worked on two packages:

arresteddev: a package for when your development is going awry! ::: column-margin Mostly, this was a good excuse to look up Arrested Development gifs, which, we established, is pronounced with a g like giraffe. ::: Includes functions such as lmgtfy(), that will seamlessly google your last error message, David Robinson’s tracestack() that will query your last error message on Stack Overflow, and squirrel(), a function that will randomly send you to a distracting website - for when things are really going poorly 💁.

ponyexpress: a package for automating speedy emails from R - copy and paste neigh more 🐴. This package allows you to send templated emails to a list of contacts. Great for conferences, birthday parties, or karaoke invitations.

play

Between our package building, there were SO many opportunities to get to know some of the most talented people.

<img src = “https://github.com/LFOD/real-blog/raw/master/static/images/jenny_lucy.jpg”“>
Jenny & I enthusiastically working on googledrive.

More than anything, this was an excellent opportunity to feel like a part of a community – and a community that certainly extends beyond the people that attended the unconference! There were so many people following along, tweeting along, and assisting along the way.

a few highlights:

🍨 ice cream outings
🎤 karaoke adventures
🍸 happy hours (complete with R-themed drinks)
💪 Karthik attempting to lick his elbow

analysis

Note: this is not particularly statistically rigorous, but it is VERY fun.

In an effort to stay on brand, I decided to do a small analysis of the tweets that came out of #runconf17. I designed a small study:

pulled all tweets (excluding retweets) using the hashtag #runconf17 between May 24th and May 30th
also pulled all tweets (excluding retweets) using the hashtag #rstats during the same time period

Question: Are twitter users who used the #runconf17 hashtag more likely to use emojis than those who only tweeted with the #rstats hashtag during the same time period?

I used the rtweet package to pull the tweets, dplyr and fuzzyjoin to wrangle the data a bit, and rms to analyze it.

library("rtweet")
library("dplyr")

Warning: package 'dplyr' was built under R version 4.2.3

library("fuzzyjoin")
library("rms")

runconf <- search_tweets(q = "#runconf17 AND since:2017-05-23 AND until:2017-05-31",
                         n = 1e4, 
                         include_rts = FALSE)

rstats <- search_tweets(q = "#rstats AND since:2017-05-23 AND until:2017-05-31",
                        n = 1e4,
                        include_rts = FALSE)

The emoji dictionary was discovered by the lovely Maëlle!

After pulling in the tweets, I categorized tweeters as either using the #runconf17 hashtag during the week or not. I then merged the tweets with an emoji dictionary, and grouped by tweeter. If the tweeter used an emoji at any point during the week, they were categorized as an emoji-user, if not, they were sad (jk, there is room for all here!).

## create variable for whether tweeted about runconf
runconf$runconf <- "yes"

rstats <- rstats %>%
  mutate(runconf = ifelse(screen_name %in% runconf$screen_name, "yes", "no"))

## load in the emoji dictionary
dico <- readr::read_csv2("https://raw.githubusercontent.com/today-is-a-good-day/emojis/master/emDict.csv")

ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.

Rows: 842 Columns: 4
── Column specification ────────────────────────────────────
Delimiter: ";"
chr (4): Description, Native, Bytes, R-encoding

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## combine datasets, keep only unique tweets
data <- bind_rows(runconf, rstats) %>%
  distinct(text, .keep_all = TRUE)

## summarize by user, did they tweet about runconf in the past week 
## & did they use an emoji in the past week?
used_emoji <- regex_left_join(data, dico, by = c(text = "Native")) %>%
  select(screen_name, 
         text,
         runconf,
         emoji = Native) %>%
  group_by(screen_name) %>%
  mutate(tot_emoji = sum(!is.na(emoji)),
         used_emoji = ifelse(tot_emoji > 0, "yes", "no"),
         tot_tweets = n_distinct(text)) %>%
  distinct(screen_name, .keep_all = TRUE)

results

We had 526 tweeters that just used the #rstats hashtag, and 107 that tweeted with the #runconf17 hashtag. ::: column-margin THESE ARE MY PEOPLE 🙌 ::: Among the #rstats tweeters, 5.9% used at least one emoji in their tweets, whereas among #runconf17 tweeters, 25.2% used emojis!

used_emoji %>%
  group_by(`tweeted #runconf` = runconf, `used emoji` = used_emoji) %>%
  tally() %>%
  mutate(`%` = 100*prop.table(n)) %>%
  knitr::kable(digits = 1)

tweeted #runconf	used emoji	n	%
no	no	495	94.1
no	yes	31	5.9
yes	no	80	74.8
yes	yes	27	25.2

Alright, that looks pretty promising, but let’s get some confidence intervals. It’s time to model it! 💃

## modeling time!
dd <- datadist(used_emoji)
options(datadist = "dd")

lrm(used_emoji~runconf, data = used_emoji) %>%
  summary() %>%
  html()

Effects Response: `used_emoji`
	Low	High	Δ	Effect	S.E.	Lower 0.95	Upper 0.95
runconf --- yes:no	1	2		1.684	0.2895	1.117	2.252
Odds Ratio	1	2		5.389		3.056	9.505

Tweeting the #runconf17 hashtag seems undeniably associated with a higher odds of emoji use (OR: 5.4, 95% CI: 3.1, 9.5).

most popular emojis

Now let’s checkout which emojis were most popular among #runconf17 tweeters. This time I’ll allow for retweets 👯

For this I used ggplot2, magick, and webshot

library("ggplot2")
library("webshot")
library("magick")

runconf_emojis <- search_tweets(q = "#runconf17 AND since:2017-05-23 AND until:2017-05-31",
                                n = 1e4)

emojis <- regex_left_join(runconf_emojis, dico, by = c(text = "Native")) %>%
  group_by(Native) %>%
  filter(!is.na(Native)) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  head(15) %>%
  mutate(num = 1:15)

This (like many things I do) was very much inspired by Maëlle’s post.

plot_emojis <- function(limit) {
  emojis_filter <- emojis %>%
    filter(emojis$n <= limit)
  out_svg <- paste0("file://emojis_", limit,".svg")
  out_png <- paste0("emojis_", limit, ".png")
  p <- ggplot(emojis_filter, aes(num, n)) + 
    geom_col() + 
    xlim(c(0,16)) +
    geom_text(aes(x = num, 
                  y = n + 1,
                  label = Native), size = 5)  +
    theme(axis.text.y=element_blank(),
          axis.ticks=element_blank(),
          legend.position="none") + 
    ylim(c(0, max(emojis$n) + 10)) +
    xlab("emoji") + 
    ggtitle("#runconf17 emojis") +
    coord_flip() 
  print(p)
  gridSVG::grid.export(out_svg)
  webshot(out_svg,
          out_png,
          vwidth = 100,
          vheight = 100,
          zoom = 3)
  out_png
}

Now let’s make them into a gif!

out_png <- purrr::map_chr(emojis$n, plot_emojis)

purrr::map(unique(rev(out_png)), image_read) %>%
  image_join() %>%
  image_animate(fps=1) %>%
  image_write("runconf_emojis.gif")

Phew, the 🐔

The purple heart seems to be the most popular emoji, which makes sense given 25% of us were #RLadies! I think it’s a credit to the awesome geographic diversity that we have two different globe emojis in our top 15!

All in all, it was an epic experience. Thank you so much to the conference organizers, attendees, and #runconf17 tweeters for such a delightful week!