runconf17, an analysis of emoji use

rOpenSci
rstats
conferences
emojis
I had such a delightful time at rOpenSci’s unconference. Not only was it extremely productive (21 packages were produced!), but in between the crazy productivity was some epic community building.
Author

Lucy D’Agostino McGowan

Published

June 4, 2017

I had such a delightful time at rOpenSci’s unconference last week.

21 📦 were produced!

Not only was it extremely productive, but in between the crazy productivity was some epic community building.

for the record, I’m an #rchickenlady, IT’S HAPPENING

Stefanie kicked the conference off with ice breakers, where we explored topics ranging from #rcatladies & #rdogfellas to impostor syndrome. It was an excellent way to get conversations starting!

work

Karthik and I worked on two packages:

arresteddev: a package for when your development is going awry! ::: column-margin Mostly, this was a good excuse to look up Arrested Development gifs, which, we established, is pronounced with a g like giraffe. ::: Includes functions such as lmgtfy(), that will seamlessly google your last error message, David Robinson’s tracestack() that will query your last error message on Stack Overflow, and squirrel(), a function that will randomly send you to a distracting website - for when things are really going poorly 💁.

ponyexpress: a package for automating speedy emails from R - copy and paste neigh more 🐴. This package allows you to send templated emails to a list of contacts. Great for conferences, birthday parties, or karaoke invitations.

play

Between our package building, there were SO many opportunities to get to know some of the most talented people.

<img src = “https://github.com/LFOD/real-blog/raw/master/static/images/jenny_lucy.jpg”“>
Jenny & I enthusiastically working on googledrive.

More than anything, this was an excellent opportunity to feel like a part of a community – and a community that certainly extends beyond the people that attended the unconference! There were so many people following along, tweeting along, and assisting along the way.

a few highlights:

analysis

Note: this is not particularly statistically rigorous, but it is VERY fun.

In an effort to stay on brand, I decided to do a small analysis of the tweets that came out of #runconf17. I designed a small study:

  • pulled all tweets (excluding retweets) using the hashtag #runconf17 between May 24th and May 30th
  • also pulled all tweets (excluding retweets) using the hashtag #rstats during the same time period

Question: Are twitter users who used the #runconf17 hashtag more likely to use emojis than those who only tweeted with the #rstats hashtag during the same time period?

I used the rtweet package to pull the tweets, dplyr and fuzzyjoin to wrangle the data a bit, and rms to analyze it.

library("rtweet")
library("dplyr")
Warning: package 'dplyr' was built under R version 4.2.3
library("fuzzyjoin")
library("rms")
runconf <- search_tweets(q = "#runconf17 AND since:2017-05-23 AND until:2017-05-31",
                         n = 1e4, 
                         include_rts = FALSE)

rstats <- search_tweets(q = "#rstats AND since:2017-05-23 AND until:2017-05-31",
                        n = 1e4,
                        include_rts = FALSE)

The emoji dictionary was discovered by the lovely Maëlle!

After pulling in the tweets, I categorized tweeters as either using the #runconf17 hashtag during the week or not. I then merged the tweets with an emoji dictionary, and grouped by tweeter. If the tweeter used an emoji at any point during the week, they were categorized as an emoji-user, if not, they were sad (jk, there is room for all here!).

## create variable for whether tweeted about runconf
runconf$runconf <- "yes"

rstats <- rstats %>%
  mutate(runconf = ifelse(screen_name %in% runconf$screen_name, "yes", "no"))

## load in the emoji dictionary
dico <- readr::read_csv2("https://raw.githubusercontent.com/today-is-a-good-day/emojis/master/emDict.csv")
ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
Rows: 842 Columns: 4
── Column specification ────────────────────────────────────
Delimiter: ";"
chr (4): Description, Native, Bytes, R-encoding

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## combine datasets, keep only unique tweets
data <- bind_rows(runconf, rstats) %>%
  distinct(text, .keep_all = TRUE)

## summarize by user, did they tweet about runconf in the past week 
## & did they use an emoji in the past week?
used_emoji <- regex_left_join(data, dico, by = c(text = "Native")) %>%
  select(screen_name, 
         text,
         runconf,
         emoji = Native) %>%
  group_by(screen_name) %>%
  mutate(tot_emoji = sum(!is.na(emoji)),
         used_emoji = ifelse(tot_emoji > 0, "yes", "no"),
         tot_tweets = n_distinct(text)) %>%
  distinct(screen_name, .keep_all = TRUE)

results

We had 526 tweeters that just used the #rstats hashtag, and 107 that tweeted with the #runconf17 hashtag. ::: column-margin THESE ARE MY PEOPLE 🙌 ::: Among the #rstats tweeters, 5.9% used at least one emoji in their tweets, whereas among #runconf17 tweeters, 25.2% used emojis!

used_emoji %>%
  group_by(`tweeted #runconf` = runconf, `used emoji` = used_emoji) %>%
  tally() %>%
  mutate(`%` = 100*prop.table(n)) %>%
  knitr::kable(digits = 1)
tweeted #runconf used emoji n %
no no 495 94.1
no yes 31 5.9
yes no 80 74.8
yes yes 27 25.2

Alright, that looks pretty promising, but let’s get some confidence intervals. It’s time to model it! 💃

## modeling time!
dd <- datadist(used_emoji)
options(datadist = "dd")

lrm(used_emoji~runconf, data = used_emoji) %>%
  summary() %>%
  html()
Effects   Response: used_emoji
Low High Δ Effect S.E. Lower 0.95 Upper 0.95
runconf --- yes:no 1 2 1.684 0.2895 1.117 2.252
Odds Ratio 1 2 5.389 3.056 9.505

Tweeting the #runconf17 hashtag seems undeniably associated with a higher odds of emoji use (OR: 5.4, 95% CI: 3.1, 9.5).