The Content of Debates
In addition to the complexity and the tone of last Fall’s presidential election debates, I started exploring the content. What did the candidates talk about?
Content – themes, substance, issues, etc. – is a more challenging, but for my purposes, a more important, feature of this text. I started with some basic word frequencies.
rm(list=ls())
library(tidyverse)
library(quanteda)
library(wordcloud)
setwd("../../../dataForDemocracy/debate2016analysis/")
load("debateSpeech2016.RData") # the corpus as created in acquire_debates.R
# subset for just HRC, DJT
candcorpus <- corpus_subset(debate16corpus, speaker=="HRC" | speaker=="DJT")
# create document feature matrix
otherStop <- c("applause", "laughter", "crosstalk", "laughther")
debatedfm <- dfm(candcorpus,
remove = c(stopwords("english"), otherStop),
remove_punct = TRUE,
remove_numbers = TRUE,
verbose = TRUE)
# most frequent terms
topfeatures(debatedfm, n = 25, groups = "speaker")
## $HRC
## think people know well can going get
## 477 435 403 321 319 308 302
## president want said make just lot us
## 292 285 228 218 210 186 180
## need senator country one got go work
## 178 175 174 173 171 168 162
## now say also look
## 151 141 140 132
##
## $DJT
## going people country know just think say get want
## 513 489 301 288 253 249 248 238 225
## said one look many well now like great way
## 221 201 198 191 184 183 176 172 163
## tell right take can go back years
## 158 153 151 139 139 137 122
Honestly, basic word frequencies revealed little of substance, so I’ll skip straight to the obligatory wordcloud (I’m pretty sure there’s a rule). To make the piechart of text visualization a little more interesting, I’ll do a comparison cloud to represent the words that are most distinctive to one of the speakers.
# set up for figure
debatedf <- as.data.frame(debatedfm) # turn into dataframe
hrcdf <- rowSums(t(debatedf[1:13,])) # sum word use by candidate
djtdf <- rowSums(t(debatedf[14:27,]))
d <- as.data.frame(cbind(hrcdf, djtdf))
# word comparison cloud (wordcloud)
colnames(d) <- c("HRC", "DJT")
comparison.cloud(d, max.words=200,
scale=c(4,.33), random.order=FALSE,
colors=c("blue3", "orange3"), title.size=1)
The bigger the word, the more its rate of appearance in one set of texts exceeds its rate of use across the entire collection.
For HRC we have: president, senator, support, think, work; some Sanders, Donald, health, and american; and bits of families, children, affordable, and working.
For DJT we have: going, country, people, say, look, great, and many; with some of tremendous, china, deal, wrong, disaster, bad, nobody; and smaller bits of million, money, mean. I struggle to find issue-related words here, except in some tiny words (russia, mexico, border, isis, obamacare). In part, this underscores how little substance was emphasized in these events.
One last angle (for this post) – another simple way to get a snapshot of the debate substance is to explore key-words-in-context for target words, like so:
### key words in context
kwic(corpus_subset(candcorpus, speaker=="HRC"), "family", 2)
##
## [HRC 2015-10-13, 344] us balance | family | and work
## [HRC 2015-10-13, 369] had paid | family | leave for
## [HRC 2015-12-19, 5812] .. | family | leave for
## [HRC 2015-12-19, 7217] of the | family | of a
## [HRC 2016-01-17, 3176] for paid | family | leave.
## [HRC 2016-02-04, 950] , paid | family | leave and
## [HRC 2016-02-04, 6905] on paid | family | leave,
## [HRC 2016-02-11, 984] average American | family | ? In
## [HRC 2016-02-11, 1007] toward paid | family | leave,
## [HRC 2016-02-11, 1689] , paid | family | leave,
## [HRC 2016-02-11, 2134] . His | family | certainly believes
## [HRC 2016-02-11, 3340] end of | family | detention,
## [HRC 2016-03-06, 3912] any other | family | . And
## [HRC 2016-03-09, 1396] to end | family | detention.And in
## [HRC 2016-03-09, 1917] to deport | family | members either
## [HRC 2016-03-09, 3198] bring your | family | back together
## [HRC 2016-03-09, 6040] . Median | family | income went
## [HRC 2016-09-26, 221] to balance | family | and work
## [HRC 2016-09-26, 249] have paid | family | leave,
## [HRC 2016-09-26, 859] trillion in | family | wealth was
## [HRC 2016-09-26, 2237] for your | family | . And
## [HRC 2016-10-09, 940] Gold Star | family | whose son
## [HRC 2016-10-09, 2656] and your | family | are just
## [HRC 2016-10-09, 5693] trillion in | family | wealth was
## [HRC 2016-10-19, 1205] and her | family | has to
## [HRC 2016-10-19, 4620] Gold Star | family | , because
kwic(corpus_subset(candcorpus, speaker=="DJT"), "family", 2)
##
## [DJT 2015-08-06, 1246] for my | family | , et
## [DJT 2015-09-16, 1217] to my | family | , to
## [DJT 2015-10-28, 799] for my | family | , for
## [DJT 2015-12-15, 766] friends, | family | , girlfriends
## [DJT 2016-03-03, 2010] to my | family | , to
## [DJT 2016-03-03, 3852] when a | family | flies into
## [DJT 2016-03-03, 3871] and his | family | gets sent
## [DJT 2016-09-26, 2358] for my | family | ? Lester
## [DJT 2016-09-26, 3797] , my | family | , my
## [DJT 2016-09-26, 9444] to her | family | , and
## [DJT 2016-10-09, 430] to my | family | . I
## [DJT 2016-10-09, 1056] for my | family | , for
Clinton’s use of the word family centered around paid family leave, worklife balance, military families, and the like. Trump’s use of family, by contrast, almost always referenced his own.
How about immigration? Here I’ll use a regular expression to extract multiple forms of the word.
kwic(corpus_subset(candcorpus, speaker=="HRC"), "immig", 2, "regex")
##
## [HRC 2015-10-13, 4363] opportunity for |
## [HRC 2015-10-13, 4431] a comprehensive |
## [HRC 2015-10-13, 4486] Demonize hard-working |
## [HRC 2015-11-14, 2675] need comprehensive |
## [HRC 2015-11-14, 2702] the net |
## [HRC 2015-11-14, 2786] of our |
## [HRC 2016-01-17, 367] do on |
## [HRC 2016-02-04, 880] Ted Kennedy's |
## [HRC 2016-02-04, 6994] in the |
## [HRC 2016-02-04, 7358] think comprehensive |
## [HRC 2016-02-11, 179] . Hardworking |
## [HRC 2016-02-11, 3118] not hardworking |
## [HRC 2016-02-11, 3146] of comprehensive |
## [HRC 2016-02-11, 3175] for comprehensive |
## [HRC 2016-02-11, 3198] to comprehensive |
## [HRC 2016-02-11, 3269] get comprehensive |
## [HRC 2016-03-09, 154] discussing comprehensive |
## [HRC 2016-03-09, 1020] to comprehensive |
## [HRC 2016-03-09, 1043] on comprehensive |
## [HRC 2016-03-09, 1084] achieved comprehensive |
## [HRC 2016-03-09, 1128] of comprehensive |
## [HRC 2016-03-09, 1283] introducing comprehensive |
## [HRC 2016-03-09, 1425] for undocumented |
## [HRC 2016-03-09, 1453] hunt down |
## [HRC 2016-03-09, 1505] Ted Kennedy's |
## [HRC 2016-03-09, 1541] going to |
## [HRC 2016-03-09, 1806] , undocumented |
## [HRC 2016-03-09, 1857] our comprehensive |
## [HRC 2016-03-09, 2130] achieve comprehensive |
## [HRC 2016-03-09, 2257] workers and |
## [HRC 2016-03-09, 2281] pass comprehensive |
## [HRC 2016-03-09, 2331] hunt down |
## [HRC 2016-03-09, 2358] get comprehensive |
## [HRC 2016-03-09, 2699] do comprehensive |
## [HRC 2016-03-09, 2802] to comprehensive |
## [HRC 2016-03-09, 2978] of comprehensive |
## [HRC 2016-03-09, 4860] - comprehensive |
## [HRC 2016-10-09, 605] also targeted |
## [HRC 2016-10-09, 5404] , about |
## [HRC 2016-10-19, 1571] my comprehensive |
## [HRC 2016-10-19, 1669] nation of |
## [HRC 2016-10-19, 1690] introducing comprehensive |
## [HRC 2016-10-19, 1759] campaign bashing |
## [HRC 2016-10-19, 1763] calling Mexican |
## [HRC 2016-10-19, 1786] deal with |
## [HRC 2016-10-19, 1797] bringing undocumented |
## [HRC 2016-10-19, 1959] to sign |
## [HRC 2016-10-19, 5295] talking about |
## [HRC 2016-10-19, 5309] of all |
## [HRC 2016-10-19, 5310] all immigrants-undocumented |
## [HRC 2016-10-19, 5323] have undocumented |
##
## immigrants | to be
## immigration | reform,
## immigrants | who have
## immigration | reform with
## immigration | from Mexico
## immigration | history and
## immigration | reform,
## immigration | reform.
## beginning.Immigration | reform,
## immigration | reform with
## immigrant | families living
## immigrant | families who
## immigration | reform.
## immigration | reform in
## immigration | reform with
## immigration | reform.
## immigration | reform with
## immigration | reform with
## immigration | reform.
## immigration | reform nine
## immigration | reform and
## immigration | reform with
## immigrants | , and
## immigrants.So | I think
## immigration | reform which
## immigrants.SALINAS | [ through
## immigrants | , the
## immigration | reform,
## immigration | reform if
## immigrants.They | have proven
## immigration | reform in
## immigrants.So | look,
## immigration | reform.
## immigration | reform.
## immigration | reform with
## immigration | reform.
## immigration | reform certainly
## immigrants | , African
## immigrants | , about
## immigration | reform plan
## immigrants | and we
## immigration | reform within
## immigrants | , calling
## immigrants | rapists and
## immigrants.Now | , what
## immigrants | out from
## immigration | reform,
## immigrants | a few
## immigrants-undocumented | immigrants in
## immigrants | in our
## immigrants | in America
kwic(corpus_subset(candcorpus, speaker=="DJT"), "immig", 2, "regex")
##
## [DJT 2015-08-06, 375] about illegal | immigration | , Chris
## [DJT 2015-09-16, 2094] are illegal | immigrants | . I
## [DJT 2015-09-16, 2191] about illegal | immigration | if it
## [DJT 2015-09-16, 2321] . Illegal | immigration | is costing
## [DJT 2015-09-16, 2464] weak on | immigration | - by
## [DJT 2015-09-16, 2485] weak on | immigration | . He
## [DJT 2015-11-10, 255] stop illegal | immigration | . It's
## [DJT 2015-11-10, 567] million illegal | immigrants | out of
## [DJT 2016-01-14, 1219] . Illegal | immigration | is beyond
## [DJT 2016-02-06, 65] I hit | immigration | , I
## [DJT 2016-02-06, 88] about illegal | immigration.Now | , everybody's
## [DJT 2016-02-13, 1955] , illegal | immigration | wasn't even
## [DJT 2016-02-13, 2008] on illegal | immigration | is Jeb
## [DJT 2016-02-13, 2035] on illegal | immigration | it's laughable
## [DJT 2016-02-25, 205] about illegals | immigration | . It
## [DJT 2016-03-03, 2390] terms of | immigration | - and
## [DJT 2016-03-03, 2783] on illegal | immigration | . Sheriff
## [DJT 2016-09-26, 4364] , illegal | immigrants | . And
## [DJT 2016-09-26, 6278] before on | immigration | . I
## [DJT 2016-10-19, 7358] of illegal | immigrants | , people
It’s a pain that the words after the keyword wrapped around in the console for Clinton’s corpus, but it’s pretty clear her primary references to immigration or immigrants centered on comprehensive immigration reform. She made some references to undocumented immigrants, but never used the language of “illegal” that characterizes Trump’s language.
Finally, for something a little more light hearted:
kwic(corpus_subset(candcorpus, speaker=="HRC"), "hands", 3)
##
## [HRC 2015-10-13, 2050] blood on his | hands | , as I'm
## [HRC 2015-10-13, 2362] into the wrong | hands | . I know
## [HRC 2015-11-14, 1045] blood on his | hands | of Americans than
## [HRC 2015-11-14, 5273] which he then | hands | over to the
## [HRC 2015-12-19, 3238] blood on his | hands | , is overturned
## [HRC 2016-02-11, 3329] journey in the | hands | of smugglers.
## [HRC 2016-03-06, 2977] we take a | hands | off approach.
## [HRC 2016-04-14, 3136] out of the | hands | of who should
## [HRC 2016-09-26, 3510] out of the | hands | of people who
## [HRC 2016-09-26, 4045] out of the | hands | of those who
## [HRC 2016-09-26, 6407] ever get their | hands | on any nuclear
## [HRC 2016-10-09, 1432] in the wrong | hands | . Look,
## [HRC 2016-10-09, 2634] plays into the | hands | of the terrorists
kwic(corpus_subset(candcorpus, speaker=="DJT"), "hands", 3)
##
## [DJT 2016-01-14, 3866] position, their | hands | up.And Iranian wise
## [DJT 2016-03-03, 511] He hit my | hands | . Nobody has
## [DJT 2016-03-03, 518] ever hit my | hands | . I have
## [DJT 2016-03-03, 530] Look at those | hands | . Are they
## [DJT 2016-03-03, 535] Are they small | hands | ?[ laughter
## [DJT 2016-03-03, 545] referred to my | hands | , if they
That’s right. Trump kept bringing up his wee hands in that March primary debate. Hahaha…ha?