More Debate Watch (… or debate read?)

In the prior post, I introduced some analysis of the 2016 presidential debates I was playing with last fall. Here’s more!1

Let’s switch to the tone of the debate. I started by applying a common dictonary of positive and negative words (the bing dictionary available in tidytext) to capture the tone or sentiment of Trump’s and Clinton’s speech using quanteda’s dictionary function.

library(tidyverse)
library(quanteda)
library(scales)
library(tidytext) 

setwd("../../../dataForDemocracy/debate2016analysis/")
load("debateSpeech2016.RData") # the corpus as created in acquire_debates.R

# Pull dictionary from tidytext, set dictionary in quanteda, and apply it to corpus
bing <- sentiments %>% filter(lexicon=="bing") 
sentDict <- dictionary(list(positive = bing$word[bing$sentiment=="positive"], negative = bing$word[bing$sentiment=="negative"])) 
debateToneDFM <- dfm(debate16corpus, dictionary = sentDict) 

# Turn the values into a data frame, create overal tone variable, and add to existing data frame
debateTone <- as.data.frame(debateToneDFM, row.names = debateToneDFM@Dimnames$docs) 
debateTone$id <- row.names(debateTone) 
debateTone <- debateTone %>% 
  mutate(tone = positive - negative)
debates16$tone <- debateTone$tone 

# Plot!
p <- ggplot(debates16, aes(x=date, y=tone))
p + geom_point(aes(color=party, shape=speaker), size=3) + 
  geom_hline(yintercept=median(debates16$tone), color="gray50") +
  scale_x_date(labels = date_format("%m/%d"), breaks = date_breaks("5 weeks")) + 
  ggtitle("'Tone' of Candidate Debate Speech") + 
  labs(y = "Overall Tone (Negative to Positive)", x = "Date of Debate") +
  scale_color_manual(values=c("blue3", "orange3"), name="Speaker", guide=guide_legend(reverse = TRUE)) +
  scale_shape_manual(values=c(19,19,1,1)) + guides(shape=FALSE) +
  theme(plot.title = element_text(face="bold", size=20, hjust=0)) + 
  theme(axis.title = element_text(face="bold", size=16)) + 
  theme(panel.grid.minor = element_blank(), legend.position = c(0.77,0.9), legend.text=element_text(face="bold", size=12))

Higher values represent a more positive tone (positive words - negative words).

I had some priors about this… and mostly they were confirmed. There’s a plausible trajectory where Clinton’s debate speech became more positive over the course of the primary campaign, as she becomes ever more certain of winning the nomination, though this changes notably once she’s engaging Trump directly. Trump, for his part, is nearly always below average, hitting a low in the October town hall debate.

We can also look at the dynamics of tone within a debate. Here I change the document unit to a sentence.

library(stringr)

# Change the documentary units into sentences
debate16corpusSent <- corpus_reshape(debate16corpus, to="sentences")
# Apply dictionary
debateToneSentDFM <- dfm(debate16corpusSent, dictionary = sentDict)

# Turn this into a dataframe, create tone=positive-negative
debateToneSent <- as.data.frame(debateToneSentDFM)
debateToneSent$id <- row.names(debateToneSent) # keep row names as id
debateToneSent$seq <- as.integer(str_sub(debateToneSent$id, 16)) # extract sentence order
debateToneSent$event <- str_sub(debateToneSent$id, 1,14) # (stringr)
debateToneSent$speaker <- str_sub(debateToneSent$id, 1,3)
debateToneSent$date <- str_sub(debateToneSent$id, 5,14)
debateToneSent$party <- ifelse(debateToneSent$speaker %in% c("HRC", "TMK"), "Dem", "Rep")
debateToneSent$date <- as.Date(debateToneSent$date, format="%Y-%m-%d")

debateToneSent <- debateToneSent %>% 
  mutate(tone = positive - negative)

# Graph 10/09/2016 town hall debate tone by sentence
debateToneSent2 <- debateToneSent %>% # get desired debate
  filter(event=="HRC 2016-10-09" | event=="DJT 2016-10-09")
p <- ggplot(debateToneSent2, aes(x=seq, y=tone))
p + geom_point(size=0.5) + geom_smooth(method="loess", span=.33) + 
  facet_wrap(~event, scales="free_x")

Kind of cool, if not particularly enlightening. Just for fun (and a little validation), let’s pull out the most positively scored sentence and the most negatively scored sentence is the town hall debate.

# Peak positive, negative
maxPos <- subset(debateToneSent2, tone==max(debateToneSent2$tone), select=id) # most positive sentence
debate16corpusSent$documents$text[docnames(debate16corpusSent)==maxPos$id]
## [1] "That's why-to go back to your question-I want to send a message-we all should-to every boy and girl and, indeed, to the entire world that America already is great, but we are great because we are good, and we will respect one another, and we will work with one another, and we will celebrate our diversity."
maxNeg <- subset(debateToneSent2, tone==min(debateToneSent2$tone), select=id) # most negative sentence
debate16corpusSent$documents$text[docnames(debate16corpusSent)==maxNeg$id]
## [1] "And when she said irredeemable, they're irredeemable, you didn't mention that, but when she said they're irredeemable, to me that might have been even worse."

Respect, work, diversity on one hand and but “she said irredeeable” on the other. Sounds about right.

Still tone, or overall polarity, obscures alot of what political scientists who study communication and persuasion care about. Next I use the NRC sentiment lexicon, also available in tidytext, to look at particular negative emotions – fear and anger and particular positive emotions – anticipation and trust (positive). These are some of the most relevant emotional registers for campaigns.

# pull dictionary, set dictionary, and apply
nrc <- sentiments %>% filter(lexicon=="nrc") 
affectDict <- dictionary(list(angerW=nrc$word[nrc$sentiment=="anger"],
                              fearW=nrc$word[nrc$sentiment=="fear"],
                              anticipationW=nrc$word[nrc$sentiment=="anticipation"],
                              trustW=nrc$word[nrc$sentiment=="trust"]))
debateAffectDFM <- dfm(debate16corpus, dictionary = affectDict) 

# Turn into a dataframe, add to existing dataframe
debateAffect <- as.data.frame(debateAffectDFM, row.names = debateAffectDFM@Dimnames$docs)
debateAffect$id <- row.names(debateAffect) 
debateAffect <- debateAffect %>% 
  mutate(tot=angerW+fearW+anticipationW+trustW,
         anger=(angerW/tot)*100,
         fear=(fearW/tot)*100,
         anticipation=(anticipationW/tot)*100,
         trust=(trustW/tot)*100)
debates16[,7:10] <- debateAffect[,7:10]

# Gather the four variables
debates16long <- debates16 %>% 
  select(date, speaker, party, anger, fear, anticipation, trust) %>% 
  gather(sentiment, value, -date, -speaker, -party)

# Plot
p <- ggplot(debates16long, aes(x=date, y=value)) 
p + geom_point(aes(color=party, shape=speaker), size=2) + 
  scale_x_date(labels = date_format("%m/%d"), breaks = date_breaks("5 weeks")) + 
  ggtitle("'Affect' of Candidate Debate Speech") + 
  labs(y = "Level of Affect", x = "Date of Debate") +
  facet_wrap(~ sentiment, ncol=2, scales="free_y") + 
  scale_color_manual(values=c("blue3", "orange3")) +
  scale_shape_manual(values=c(19,19,1,1)) + guides(shape="none", color="none") +
  theme(plot.title = element_text(face="bold", size=20, hjust=0)) + 
  theme(axis.title = element_text(face="bold", size=16)) + 
  theme(strip.text = element_text(size=16)) +
  theme(panel.grid.minor = element_blank(), legend.position = c(0.85,0.9), legend.text=element_text(face="bold", size=12))

Even without the visual of each candidate’s manner and stance (which was where a good deal of the affect resides), we can see the difference in candidate style in Trump’s and Clinton’s speech acts. The biggest difference was in the use of anger.

These approaches aren’t the only way to look at tone or affect, of course. We might choose any number of alternative dictionaries, with any number of alternative features. But even this simple approach seems to be getting at something with some face validity.


  1. Because who wasn’t dying to relive last year’s presidential campaign!