The Content of Debates

In addition to the complexity and the tone of last Fall’s presidential election debates, I started exploring the content. What did the candidates talk about?

Content – themes, substance, issues, etc. – is a more challenging, but for my purposes, a more important, feature of this text. I started with some basic word frequencies.

rm(list=ls())
library(tidyverse)
library(quanteda)
library(wordcloud)

setwd("../../../dataForDemocracy/debate2016analysis/")
load("debateSpeech2016.RData") # the corpus as created in acquire_debates.R

# subset for just HRC, DJT
candcorpus <- corpus_subset(debate16corpus, speaker=="HRC" | speaker=="DJT")

# create document feature matrix
otherStop <- c("applause", "laughter", "crosstalk", "laughther")
debatedfm <- dfm(candcorpus, 
               remove = c(stopwords("english"), otherStop),
               remove_punct = TRUE,
               remove_numbers = TRUE,
               verbose = TRUE)

# most frequent terms
topfeatures(debatedfm, n = 25, groups = "speaker")
## $HRC
##     think    people      know      well       can     going       get 
##       477       435       403       321       319       308       302 
## president      want      said      make      just       lot        us 
##       292       285       228       218       210       186       180 
##      need   senator   country       one       got        go      work 
##       178       175       174       173       171       168       162 
##       now       say      also      look 
##       151       141       140       132 
## 
## $DJT
##   going  people country    know    just   think     say     get    want 
##     513     489     301     288     253     249     248     238     225 
##    said     one    look    many    well     now    like   great     way 
##     221     201     198     191     184     183     176     172     163 
##    tell   right    take     can      go    back   years 
##     158     153     151     139     139     137     122

Honestly, basic word frequencies revealed little of substance, so I’ll skip straight to the obligatory wordcloud (I’m pretty sure there’s a rule). To make the piechart of text visualization a little more interesting, I’ll do a comparison cloud to represent the words that are most distinctive to one of the speakers.

# set up for figure
debatedf <- as.data.frame(debatedfm) # turn into dataframe
hrcdf <- rowSums(t(debatedf[1:13,])) # sum word use by candidate
djtdf <- rowSums(t(debatedf[14:27,]))
d <- as.data.frame(cbind(hrcdf, djtdf))

# word comparison cloud (wordcloud)
colnames(d) <- c("HRC", "DJT")
comparison.cloud(d, max.words=200, 
                 scale=c(4,.33), random.order=FALSE, 
                 colors=c("blue3", "orange3"), title.size=1)

The bigger the word, the more its rate of appearance in one set of texts exceeds its rate of use across the entire collection.

For HRC we have: president, senator, support, think, work; some Sanders, Donald, health, and american; and bits of families, children, affordable, and working.

For DJT we have: going, country, people, say, look, great, and many; with some of tremendous, china, deal, wrong, disaster, bad, nobody; and smaller bits of million, money, mean. I struggle to find issue-related words here, except in some tiny words (russia, mexico, border, isis, obamacare). In part, this underscores how little substance was emphasized in these events.

One last angle (for this post) – another simple way to get a snapshot of the debate substance is to explore key-words-in-context for target words, like so:

### key words in context
kwic(corpus_subset(candcorpus, speaker=="HRC"), "family", 2)
##                                                                       
##   [HRC 2015-10-13, 344]       us balance | family | and work          
##   [HRC 2015-10-13, 369]         had paid | family | leave for         
##  [HRC 2015-12-19, 5812]               .. | family | leave for         
##  [HRC 2015-12-19, 7217]           of the | family | of a              
##  [HRC 2016-01-17, 3176]         for paid | family | leave.            
##   [HRC 2016-02-04, 950]           , paid | family | leave and         
##  [HRC 2016-02-04, 6905]          on paid | family | leave,            
##   [HRC 2016-02-11, 984] average American | family | ? In              
##  [HRC 2016-02-11, 1007]      toward paid | family | leave,            
##  [HRC 2016-02-11, 1689]           , paid | family | leave,            
##  [HRC 2016-02-11, 2134]            . His | family | certainly believes
##  [HRC 2016-02-11, 3340]           end of | family | detention,        
##  [HRC 2016-03-06, 3912]        any other | family | . And             
##  [HRC 2016-03-09, 1396]           to end | family | detention.And in  
##  [HRC 2016-03-09, 1917]        to deport | family | members either    
##  [HRC 2016-03-09, 3198]       bring your | family | back together     
##  [HRC 2016-03-09, 6040]         . Median | family | income went       
##   [HRC 2016-09-26, 221]       to balance | family | and work          
##   [HRC 2016-09-26, 249]        have paid | family | leave,            
##   [HRC 2016-09-26, 859]      trillion in | family | wealth was        
##  [HRC 2016-09-26, 2237]         for your | family | . And             
##   [HRC 2016-10-09, 940]        Gold Star | family | whose son         
##  [HRC 2016-10-09, 2656]         and your | family | are just          
##  [HRC 2016-10-09, 5693]      trillion in | family | wealth was        
##  [HRC 2016-10-19, 1205]          and her | family | has to            
##  [HRC 2016-10-19, 4620]        Gold Star | family | , because
kwic(corpus_subset(candcorpus, speaker=="DJT"), "family", 2)
##                                                          
##  [DJT 2015-08-06, 1246]   for my | family | , et         
##  [DJT 2015-09-16, 1217]    to my | family | , to         
##   [DJT 2015-10-28, 799]   for my | family | , for        
##   [DJT 2015-12-15, 766] friends, | family | , girlfriends
##  [DJT 2016-03-03, 2010]    to my | family | , to         
##  [DJT 2016-03-03, 3852]   when a | family | flies into   
##  [DJT 2016-03-03, 3871]  and his | family | gets sent    
##  [DJT 2016-09-26, 2358]   for my | family | ? Lester     
##  [DJT 2016-09-26, 3797]     , my | family | , my         
##  [DJT 2016-09-26, 9444]   to her | family | , and        
##   [DJT 2016-10-09, 430]    to my | family | . I          
##  [DJT 2016-10-09, 1056]   for my | family | , for

Clinton’s use of the word family centered around paid family leave, worklife balance, military families, and the like. Trump’s use of family, by contrast, almost always referenced his own.

How about immigration? Here I’ll use a regular expression to extract multiple forms of the word.

kwic(corpus_subset(candcorpus, speaker=="HRC"), "immig", 2, "regex")
##                                                      
##  [HRC 2015-10-13, 4363]             opportunity for |
##  [HRC 2015-10-13, 4431]             a comprehensive |
##  [HRC 2015-10-13, 4486]       Demonize hard-working |
##  [HRC 2015-11-14, 2675]          need comprehensive |
##  [HRC 2015-11-14, 2702]                     the net |
##  [HRC 2015-11-14, 2786]                      of our |
##   [HRC 2016-01-17, 367]                       do on |
##   [HRC 2016-02-04, 880]               Ted Kennedy's |
##  [HRC 2016-02-04, 6994]                      in the |
##  [HRC 2016-02-04, 7358]         think comprehensive |
##   [HRC 2016-02-11, 179]               . Hardworking |
##  [HRC 2016-02-11, 3118]             not hardworking |
##  [HRC 2016-02-11, 3146]            of comprehensive |
##  [HRC 2016-02-11, 3175]           for comprehensive |
##  [HRC 2016-02-11, 3198]            to comprehensive |
##  [HRC 2016-02-11, 3269]           get comprehensive |
##   [HRC 2016-03-09, 154]    discussing comprehensive |
##  [HRC 2016-03-09, 1020]            to comprehensive |
##  [HRC 2016-03-09, 1043]            on comprehensive |
##  [HRC 2016-03-09, 1084]      achieved comprehensive |
##  [HRC 2016-03-09, 1128]            of comprehensive |
##  [HRC 2016-03-09, 1283]   introducing comprehensive |
##  [HRC 2016-03-09, 1425]            for undocumented |
##  [HRC 2016-03-09, 1453]                   hunt down |
##  [HRC 2016-03-09, 1505]               Ted Kennedy's |
##  [HRC 2016-03-09, 1541]                    going to |
##  [HRC 2016-03-09, 1806]              , undocumented |
##  [HRC 2016-03-09, 1857]           our comprehensive |
##  [HRC 2016-03-09, 2130]       achieve comprehensive |
##  [HRC 2016-03-09, 2257]                 workers and |
##  [HRC 2016-03-09, 2281]          pass comprehensive |
##  [HRC 2016-03-09, 2331]                   hunt down |
##  [HRC 2016-03-09, 2358]           get comprehensive |
##  [HRC 2016-03-09, 2699]            do comprehensive |
##  [HRC 2016-03-09, 2802]            to comprehensive |
##  [HRC 2016-03-09, 2978]            of comprehensive |
##  [HRC 2016-03-09, 4860]             - comprehensive |
##   [HRC 2016-10-09, 605]               also targeted |
##  [HRC 2016-10-09, 5404]                     , about |
##  [HRC 2016-10-19, 1571]            my comprehensive |
##  [HRC 2016-10-19, 1669]                   nation of |
##  [HRC 2016-10-19, 1690]   introducing comprehensive |
##  [HRC 2016-10-19, 1759]            campaign bashing |
##  [HRC 2016-10-19, 1763]             calling Mexican |
##  [HRC 2016-10-19, 1786]                   deal with |
##  [HRC 2016-10-19, 1797]       bringing undocumented |
##  [HRC 2016-10-19, 1959]                     to sign |
##  [HRC 2016-10-19, 5295]               talking about |
##  [HRC 2016-10-19, 5309]                      of all |
##  [HRC 2016-10-19, 5310] all immigrants-undocumented |
##  [HRC 2016-10-19, 5323]           have undocumented |
##                                            
##        immigrants        | to be           
##        immigration       | reform,         
##        immigrants        | who have        
##        immigration       | reform with     
##        immigration       | from Mexico     
##        immigration       | history and     
##        immigration       | reform,         
##        immigration       | reform.         
##   beginning.Immigration  | reform,         
##        immigration       | reform with     
##         immigrant        | families living 
##         immigrant        | families who    
##        immigration       | reform.         
##        immigration       | reform in       
##        immigration       | reform with     
##        immigration       | reform.         
##        immigration       | reform with     
##        immigration       | reform with     
##        immigration       | reform.         
##        immigration       | reform nine     
##        immigration       | reform and      
##        immigration       | reform with     
##        immigrants        | , and           
##       immigrants.So      | I think         
##        immigration       | reform which    
##    immigrants.SALINAS    | [ through       
##        immigrants        | , the           
##        immigration       | reform,         
##        immigration       | reform if       
##      immigrants.They     | have proven     
##        immigration       | reform in       
##       immigrants.So      | look,           
##        immigration       | reform.         
##        immigration       | reform.         
##        immigration       | reform with     
##        immigration       | reform.         
##        immigration       | reform certainly
##        immigrants        | , African       
##        immigrants        | , about         
##        immigration       | reform plan     
##        immigrants        | and we          
##        immigration       | reform within   
##        immigrants        | , calling       
##        immigrants        | rapists and     
##      immigrants.Now      | , what          
##        immigrants        | out from        
##        immigration       | reform,         
##        immigrants        | a few           
##  immigrants-undocumented | immigrants in   
##        immigrants        | in our          
##        immigrants        | in America
kwic(corpus_subset(candcorpus, speaker=="DJT"), "immig", 2, "regex")
##                                                                           
##   [DJT 2015-08-06, 375]   about illegal |   immigration   | , Chris       
##  [DJT 2015-09-16, 2094]     are illegal |   immigrants    | . I           
##  [DJT 2015-09-16, 2191]   about illegal |   immigration   | if it         
##  [DJT 2015-09-16, 2321]       . Illegal |   immigration   | is costing    
##  [DJT 2015-09-16, 2464]         weak on |   immigration   | - by          
##  [DJT 2015-09-16, 2485]         weak on |   immigration   | . He          
##   [DJT 2015-11-10, 255]    stop illegal |   immigration   | . It's        
##   [DJT 2015-11-10, 567] million illegal |   immigrants    | out of        
##  [DJT 2016-01-14, 1219]       . Illegal |   immigration   | is beyond     
##    [DJT 2016-02-06, 65]           I hit |   immigration   | , I           
##    [DJT 2016-02-06, 88]   about illegal | immigration.Now | , everybody's 
##  [DJT 2016-02-13, 1955]       , illegal |   immigration   | wasn't even   
##  [DJT 2016-02-13, 2008]      on illegal |   immigration   | is Jeb        
##  [DJT 2016-02-13, 2035]      on illegal |   immigration   | it's laughable
##   [DJT 2016-02-25, 205]  about illegals |   immigration   | . It          
##  [DJT 2016-03-03, 2390]        terms of |   immigration   | - and         
##  [DJT 2016-03-03, 2783]      on illegal |   immigration   | . Sheriff     
##  [DJT 2016-09-26, 4364]       , illegal |   immigrants    | . And         
##  [DJT 2016-09-26, 6278]       before on |   immigration   | . I           
##  [DJT 2016-10-19, 7358]      of illegal |   immigrants    | , people

It’s a pain that the words after the keyword wrapped around in the console for Clinton’s corpus, but it’s pretty clear her primary references to immigration or immigrants centered on comprehensive immigration reform. She made some references to undocumented immigrants, but never used the language of “illegal” that characterizes Trump’s language.

Finally, for something a little more light hearted:

kwic(corpus_subset(candcorpus, speaker=="HRC"), "hands", 3)
##                                                                   
##  [HRC 2015-10-13, 2050]   blood on his | hands | , as I'm         
##  [HRC 2015-10-13, 2362] into the wrong | hands | . I know         
##  [HRC 2015-11-14, 1045]   blood on his | hands | of Americans than
##  [HRC 2015-11-14, 5273]  which he then | hands | over to the      
##  [HRC 2015-12-19, 3238]   blood on his | hands | , is overturned  
##  [HRC 2016-02-11, 3329] journey in the | hands | of smugglers.    
##  [HRC 2016-03-06, 2977]      we take a | hands | off approach.    
##  [HRC 2016-04-14, 3136]     out of the | hands | of who should    
##  [HRC 2016-09-26, 3510]     out of the | hands | of people who    
##  [HRC 2016-09-26, 4045]     out of the | hands | of those who     
##  [HRC 2016-09-26, 6407] ever get their | hands | on any nuclear   
##  [HRC 2016-10-09, 1432]   in the wrong | hands | . Look,          
##  [HRC 2016-10-09, 2634] plays into the | hands | of the terrorists
kwic(corpus_subset(candcorpus, speaker=="DJT"), "hands", 3)
##                                                                      
##  [DJT 2016-01-14, 3866] position, their | hands | up.And Iranian wise
##   [DJT 2016-03-03, 511]       He hit my | hands | . Nobody has       
##   [DJT 2016-03-03, 518]     ever hit my | hands | . I have           
##   [DJT 2016-03-03, 530]   Look at those | hands | . Are they         
##   [DJT 2016-03-03, 535]  Are they small | hands | ?[ laughter        
##   [DJT 2016-03-03, 545]  referred to my | hands | , if they

That’s right. Trump kept bringing up his wee hands in that March primary debate. Hahaha…ha?