Saturday, January 25, 2014

Economist Year in Review: Part 3

Finally, I want to show a network graph of countries based on the correlations using LDA. I used qgraph() in R. Kind of interesting how Switzerland and Norway are outside the main European Cluster. 



Economist Year in Review: Part 2

Economist Year in Review: Part 2

I like how The Economist focuses on Geo Political issues and I want to capture that in my analysis. I therefore labelled each article by the countries/demonyns included in the text (If either “United States” or “American” was included in text then it would label that particular article “United States”. Of course, an article can be labelled many countries).
I then removed all instances of Country Names from the text. I did so because I wanted to use LDA again, and having country names in the text would create a dependancy I don't want.
So, for this part of analysis I have two matrices. One involves LDA matrix of 3440 articles by 100 topics that describes each article as a percentage of 100 topics (with the country names and demonyms removed from the text). The second matrix is 3440 by 193 which represents each article and which countries are mentioned in the text of the article.
The question I want to address is; what are the main groups of international affairs? For example; we expect Syria, Iran, and US to be in a cluster and China, Japan, and US to a cluster as well.
To do this, I used K-means clustering on the articles by countries matrix to group articles based on the countries included in the text. Then to understand what these groups discuss, I averaged the topics of each classified group.
Lets just see how many times each country is mentioned at least once in an article
load("/Users/sweiss/Google Drive/countrymatrix.rdata")
country.sums = colSums(country.mat)
barplot(sort(country.sums, decreasing = TRUE)[1:50], las = 3, cex.names = 0.7, 
    ylab = "Number of Times a country name or Demonym occured at least once in an article", 
    main = "Number of times a Country was Identified in an article (Economist 2013)")
plot of chunk unnamed-chunk-1
USA is number 1 with UK in a distance 2nd. Obviously there's some UK bias because this is an English newspaper.
Below are the clusters of articles based on the countries named and Topics most associated with those clusters. I chose 10 clusters using the elbow graph method (not shown).
library(topicmodels)
load("/users/sweiss/google drive/economistnocountryld100.Rda")
theta = posterior(ld.100)$topics
theta.average = colMeans(theta)
top.5.factors = names(sort(theta.average, decreasing = TRUE))
load("/Users/sweiss/Google Drive/countrymatrix.rdata")

kmeans.10 <- kmeans(x = country.mat, centers = 10)


cluster.10 = kmeans.10$cluster
for (i in 1:10) {
    theta.average.cluster.1 = colMeans(theta[which(cluster.10 == i), ])
    top.5.factors.cluster.1 = names(sort(theta.average.cluster.1, decreasing = TRUE))
    print(paste("Cluster", i))
    print("Top 10 Countries in Cluster")
    print(sort(colSums(country.mat[which(cluster.10 == i), ]), decreasing = TRUE)[1:10])
    print("Average Number of Countries in Article")
    print(mean(rowSums(country.mat[which(cluster.10 == i), ])))
    print("Number of Articles in Cluster")
    print(sum(rowSums(country.mat[which(cluster.10 == i), ])))
    print(terms(ld.100, 10)[, as.numeric(top.5.factors.cluster.1)[1:10]])
}
## [1] "Cluster 1"
## [1] "Top 10 Countries in Cluster"
## United Kingdom         France          Japan         Russia          Italy 
##            254             73             60             46             44 
##          Spain         Israel         Turkey    Netherlands         Canada 
##             34             33             33             32             31 
## [1] "Average Number of Countries in Article"
## [1] 1.224
## [1] "Number of Articles in Cluster"
## [1] 1567
##       Topic 33    Topic 19   Topic 97   Topic 79   Topic 66    
##  [1,] "rate"      "elect"    "cell"     "labour"   "minist"    
##  [2,] "price"     "parti"    "research" "cameron"  "govern"    
##  [3,] "economist" "vote"     "scienc"   "tori"     "prime"     
##  [4,] "interest"  "voter"    "work"     "parti"    "polit"     
##  [5,] "index"     "poll"     "human"    "britain"  "leader"    
##  [6,] "market"    "polit"    "brain"    "conserv"  "parliament"
##  [7,] "trade"     "campaign" "one"      "polit"    "parti"     
##  [8,] "commod"    "seat"     "cancer"   "miliband" "opposit"   
##  [9,] "job"       "win"      "might"    "mps"      "elect"     
## [10,] "balanc"    "candid"   "use"      "david"    "berlusconi"
##       Topic 40     Topic 16  Topic 26   Topic 58  Topic 53 
##  [1,] "presid"     "polic"   "protest"  "test"    "local"  
##  [2,] "polit"      "crime"   "govern"   "time"    "town"   
##  [3,] "elect"      "prison"  "street"   "ask"     "park"   
##  [4,] "power"      "crimin"  "erdogan"  "peopl"   "build"  
##  [5,] "leader"     "say"     "polic"    "experi"  "council"
##  [6,] "presidenti" "sentenc" "support"  "think"   "centr"  
##  [7,] "constitut"  "drug"    "call"     "word"    "place"  
##  [8,] "countri"    "murder"  "demonstr" "suggest" "peopl"  
##  [9,] "year"       "peopl"   "polit"    "person"  "plan"   
## [10,] "run"        "jail"    "opposit"  "relat"   "new"    
## [1] "Cluster 2"
## [1] "Top 10 Countries in Cluster"
##         France        Germany          Italy          Spain United Kingdom 
##            128            118            116            103             58 
##  United States         Greece    Netherlands        Ireland         Russia 
##             55             53             52             38             32 
## [1] "Average Number of Countries in Article"
## [1] 8.138
## [1] "Number of Articles in Cluster"
## [1] 1237
##       Topic 9    Topic 28   Topic 88  Topic 62   Topic 83  Topic 14 
##  [1,] "euro"     "european" "bank"    "bond"     "price"   "economi"
##  [2,] "zone"     "europ"    "financi" "debt"     "market"  "growth" 
##  [3,] "european" "union"    "loan"    "rate"     "cost"    "econom" 
##  [4,] "countri"  "commiss"  "lend"    "yield"    "rise"    "gdp"    
##  [5,] "bank"     "countri"  "capit"   "investor" "year"    "invest" 
##  [6,] "crisi"    "nation"   "deposit" "govern"   "demand"  "export" 
##  [7,] "bail"     "want"     "credit"  "interest" "low"     "account"
##  [8,] "imf"      "brussel"  "crisi"   "market"   "increas" "product"
##  [9,] "market"   "treati"   "borrow"  "borrow"   "like"    "year"   
## [10,] "economi"  "like"     "asset"   "default"  "high"    "busi"   
##       Topic 29   Topic 17   Topic 91   Topic 52   
##  [1,] "worker"   "firm"     "mrs"      "left"     
##  [2,] "job"      "market"   "merkel"   "holland"  
##  [3,] "work"     "industri" "left"     "right"    
##  [4,] "labour"   "product"  "parti"    "presid"   
##  [5,] "employ"   "big"      "coalit"   "polit"    
##  [6,] "wage"     "new"      "green"    "socialist"
##  [7,] "unemploy" "profit"   "centr"    "now"      
##  [8,] "skill"    "busi"     "govern"   "put"      
##  [9,] "pay"      "sale"     "democrat" "yet"      
## [10,] "factori"  "compani"  "social"   "fran"     
## [1] "Cluster 3"
## [1] "Top 10 Countries in Cluster"
##  United States United Kingdom         Canada         France         Russia 
##            686            115             49             46             37 
##          Japan    Afghanistan      Australia          Spain        Georgia 
##             34             27             24             23             21 
## [1] "Average Number of Countries in Article"
## [1] 2.257
## [1] "Number of Articles in Cluster"
## [1] 1548
##       Topic 87     Topic 72 Topic 19   Topic 90   Topic 63   Topic 47    
##  [1,] "republican" "court"  "elect"    "insur"    "fund"     "state"     
##  [2,] "obama"      "law"    "parti"    "health"   "investor" "feder"     
##  [3,] "democrat"   "case"   "vote"     "will"     "return"   "california"
##  [4,] "senat"      "legal"  "voter"    "plan"     "invest"   "governor"  
##  [5,] "polit"      "judg"   "poll"     "care"     "asset"    "year"      
##  [6,] "congress"   "rule"   "polit"    "feder"    "equiti"   "say"       
##  [7,] "hous"       "right"  "campaign" "obamacar" "share"    "one"       
##  [8,] "bill"       "lawyer" "seat"     "mani"     "manag"    "texa"      
##  [9,] "america"    "suprem" "win"      "exchang"  "profit"   "back"      
## [10,] "barack"     "justic" "candid"   "state"    "money"    "also"      
##       Topic 97   Topic 64    Topic 62   Topic 88 
##  [1,] "cell"     "compani"   "bond"     "bank"   
##  [2,] "research" "firm"      "debt"     "financi"
##  [3,] "scienc"   "busi"      "rate"     "loan"   
##  [4,] "work"     "deal"      "yield"    "lend"   
##  [5,] "human"    "share"     "investor" "capit"  
##  [6,] "brain"    "billion"   "govern"   "deposit"
##  [7,] "one"      "sharehold" "interest" "credit" 
##  [8,] "cancer"   "buy"       "market"   "crisi"  
##  [9,] "might"    "stake"     "borrow"   "borrow" 
## [10,] "use"      "privat"    "default"  "asset"  
## [1] "Cluster 4"
## [1] "Top 10 Countries in Cluster"
##        Germany  United States United Kingdom         France          China 
##            278            112             94             70             44 
##    Netherlands    Switzerland         Russia          Japan         Greece 
##             32             29             27             26             21 
## [1] "Average Number of Countries in Article"
## [1] 4.324
## [1] "Number of Articles in Cluster"
## [1] 1202
##       Topic 91   Topic 28   Topic 9    Topic 88  Topic 17   Topic 64   
##  [1,] "mrs"      "european" "euro"     "bank"    "firm"     "compani"  
##  [2,] "merkel"   "europ"    "zone"     "financi" "market"   "firm"     
##  [3,] "left"     "union"    "european" "loan"    "industri" "busi"     
##  [4,] "parti"    "commiss"  "countri"  "lend"    "product"  "deal"     
##  [5,] "coalit"   "countri"  "bank"     "capit"   "big"      "share"    
##  [6,] "green"    "nation"   "crisi"    "deposit" "new"      "billion"  
##  [7,] "centr"    "want"     "bail"     "credit"  "profit"   "sharehold"
##  [8,] "govern"   "brussel"  "imf"      "crisi"   "busi"     "buy"      
##  [9,] "democrat" "treati"   "market"   "borrow"  "sale"     "stake"    
## [10,] "social"   "like"     "economi"  "asset"   "compani"  "privat"   
##       Topic 76  Topic 66     Topic 29   Topic 15  
##  [1,] "billion" "minist"     "worker"   "investig"
##  [2,] "year"    "govern"     "job"      "claim"   
##  [3,] "will"    "prime"      "work"     "case"    
##  [4,] "cost"    "polit"      "labour"   "alleg"   
##  [5,] "also"    "leader"     "employ"   "charg"   
##  [6,] "worth"   "parliament" "wage"     "report"  
##  [7,] "total"   "parti"      "unemploy" "trial"   
##  [8,] "last"    "opposit"    "skill"    "said"    
##  [9,] "make"    "elect"      "pay"      "former"  
## [10,] "estim"   "berlusconi" "factori"  "accus"   
## [1] "Cluster 5"
## [1] "Top 10 Countries in Cluster"
##         Syria United States          Iraq          Iran        Israel 
##           121            88            80            73            66 
##        Turkey       Lebanon        Russia         Egypt  Saudi Arabia 
##            57            42            40            38            37 
## [1] "Average Number of Countries in Article"
## [1] 7.39
## [1] "Number of Articles in Cluster"
## [1] 1005
##       Topic 46  Topic 48     Topic 81      Topic 92   Topic 21   
##  [1,] "rebel"   "america"    "muslim"      "forc"     "attack"   
##  [2,] "assad"   "obama"      "ian"         "armi"     "kill"     
##  [3,] "regim"   "presid"     "islam"       "defenc"   "war"      
##  [4,] "war"     "nuclear"    "islamist"    "militari" "bomb"     
##  [5,] "govern"  "polici"     "brother"     "arm"      "group"    
##  [6,] "forc"    "relat"      "now"         "secur"    "terrorist"
##  [7,] "north"   "intern"     "morsi"       "general"  "dead"     
##  [8,] "arm"     "weapon"     "brotherhood" "war"      "violenc"  
##  [9,] "group"   "washington" "back"        "soldier"  "terror"   
## [10,] "western" "foreign"    "includ"      "troop"    "drone"    
##       Topic 89      Topic 26   Topic 66     Topic 37    Topic 7  
##  [1,] "white"       "protest"  "minist"     "deal"      "polit"  
##  [2,] "black"       "govern"   "govern"     "trade"     "putin"  
##  [3,] "palestinian" "street"   "prime"      "talk"      "anti"   
##  [4,] "king"        "erdogan"  "polit"      "negoti"    "now"    
##  [5,] "relat"       "polic"    "leader"     "agreement" "also"   
##  [6,] "arab"        "support"  "parliament" "agre"      "may"    
##  [7,] "say"         "call"     "parti"      "side"      "kremlin"
##  [8,] "west"        "demonstr" "opposit"    "free"      "soviet" 
##  [9,] "state"       "polit"    "elect"      "sign"      "western"
## [10,] "two"         "opposit"  "berlusconi" "two"       "power"  
## [1] "Cluster 6"
## [1] "Top 10 Countries in Cluster"
##          Niger        Nigeria  United States         France United Kingdom 
##             55             52             20             14             13 
##           Mali   South Africa          China          Ghana       Cameroon 
##             12             12             10              9              7 
## [1] "Average Number of Countries in Article"
## [1] 7.182
## [1] "Number of Articles in Cluster"
## [1] 395
##       Topic 60  Topic 46  Topic 21    Topic 22  Topic 40     Topic 5     
##  [1,] "africa"  "rebel"   "attack"    "money"   "presid"     "food"      
##  [2,] "ship"    "assad"   "kill"      "pay"     "polit"      "farm"      
##  [3,] "african" "regim"   "war"       "servic"  "elect"      "farmer"    
##  [4,] "port"    "war"     "bomb"      "save"    "power"      "product"   
##  [5,] "region"  "govern"  "group"     "charg"   "leader"     "say"       
##  [6,] "contain" "forc"    "terrorist" "cost"    "presidenti" "produc"    
##  [7,] "intern"  "north"   "dead"      "card"    "constitut"  "agricultur"
##  [8,] "world"   "arm"     "violenc"   "fee"     "countri"    "meat"      
##  [9,] "dubai"   "group"   "terror"    "payment" "year"       "rice"      
## [10,] "countri" "western" "drone"     "account" "run"        "year"      
##       Topic 92   Topic 88  Topic 49  Topic 81     
##  [1,] "forc"     "bank"    "immigr"  "muslim"     
##  [2,] "armi"     "financi" "border"  "ian"        
##  [3,] "defenc"   "loan"    "migrant" "islam"      
##  [4,] "militari" "lend"    "mani"    "islamist"   
##  [5,] "arm"      "capit"   "peopl"   "brother"    
##  [6,] "secur"    "deposit" "say"     "now"        
##  [7,] "general"  "credit"  "illeg"   "morsi"      
##  [8,] "war"      "crisi"   "year"    "brotherhood"
##  [9,] "soldier"  "borrow"  "countri" "back"       
## [10,] "troop"    "asset"   "issu"    "includ"     
## [1] "Cluster 7"
## [1] "Top 10 Countries in Cluster"
##          China  United States          Japan United Kingdom         Russia 
##            409            187             91             60             40 
##         France      Australia         Canada        Vietnam         Taiwan 
##             39             29             24             24             23 
## [1] "Average Number of Countries in Article"
## [1] 3.399
## [1] "Number of Articles in Cluster"
## [1] 1390
##       Topic 85    Topic 24 Topic 17   Topic 95    Topic 4      Topic 14 
##  [1,] "offici"    "south"  "firm"     "project"   "parti"      "economi"
##  [2,] "beij"      "north"  "market"   "mine"      "polit"      "growth" 
##  [3,] "said"      "korea"  "industri" "water"     "power"      "econom" 
##  [4,] "recent"    "asia"   "product"  "govern"    "leader"     "gdp"    
##  [5,] "report"    "island" "big"      "build"     "nation"     "invest" 
##  [6,] "one"       "region" "new"      "river"     "politician" "export" 
##  [7,] "govern"    "relat"  "profit"   "say"       "member"     "account"
##  [8,] "ministri"  "east"   "busi"     "construct" "congress"   "product"
##  [9,] "public"    "sea"    "sale"     "one"       "support"    "year"   
## [10,] "communist" "also"   "compani"  "plan"      "constitut"  "busi"   
##       Topic 86 Topic 88  Topic 48     Topic 74 
##  [1,] "open"   "bank"    "america"    "foreign"
##  [2,] "also"   "financi" "obama"      "govern" 
##  [3,] "will"   "loan"    "presid"     "countri"
##  [4,] "mani"   "lend"    "nuclear"    "local"  
##  [5,] "hong"   "capit"   "polici"     "make"   
##  [6,] "can"    "deposit" "relat"      "control"
##  [7,] "anoth"  "credit"  "intern"     "abroad" 
##  [8,] "kong"   "crisi"   "weapon"     "intern" 
##  [9,] "argu"   "borrow"  "washington" "last"   
## [10,] "one"    "asset"   "foreign"    "may"    
## [1] "Cluster 8"
## [1] "Top 10 Countries in Cluster"
##  United States         Brazil         Mexico          Spain          Chile 
##            124             80             78             23             21 
##          China      Venezuela      Argentina United Kingdom       Colombia 
##             18             16             15             15             14 
## [1] "Average Number of Countries in Article"
## [1] 5.233
## [1] "Number of Articles in Cluster"
## [1] 675
##       Topic 11  Topic 40     Topic 17   Topic 16  Topic 87     Topic 82
##  [1,] "countri" "presid"     "firm"     "polic"   "republican" "reform"
##  [2,] "world"   "polit"      "market"   "crime"   "obama"      "govern"
##  [3,] "global"  "elect"      "industri" "prison"  "democrat"   "will"  
##  [4,] "america" "power"      "product"  "crimin"  "senat"      "polici"
##  [5,] "develop" "leader"     "big"      "say"     "polit"      "chang" 
##  [6,] "rich"    "presidenti" "new"      "sentenc" "congress"   "system"
##  [7,] "emerg"   "constitut"  "profit"   "drug"    "hous"       "plan"  
##  [8,] "intern"  "countri"    "busi"     "murder"  "bill"       "polit" 
##  [9,] "latin"   "year"       "sale"     "peopl"   "america"    "public"
## [10,] "accord"  "run"        "compani"  "jail"    "barack"     "need"  
##       Topic 14  Topic 37    Topic 75  Topic 47    
##  [1,] "economi" "deal"      "number"  "state"     
##  [2,] "growth"  "trade"     "america" "feder"     
##  [3,] "econom"  "talk"      "sinc"    "california"
##  [4,] "gdp"     "negoti"    "time"    "governor"  
##  [5,] "invest"  "agreement" "less"    "year"      
##  [6,] "export"  "agre"      "rate"    "say"       
##  [7,] "account" "side"      "year"    "one"       
##  [8,] "product" "free"      "declin"  "texa"      
##  [9,] "year"    "sign"      "rise"    "back"      
## [10,] "busi"    "two"       "increas" "also"      
## [1] "Cluster 9"
## [1] "Top 10 Countries in Cluster"
##          India          China  United States United Kingdom          Japan 
##            292            123            114             78             51 
##       Pakistan         Russia         Brazil      Australia      Indonesia 
##             43             39             33             32             31 
## [1] "Average Number of Countries in Article"
## [1] 5.014
## [1] "Number of Articles in Cluster"
## [1] 1464
##       Topic 4      Topic 19   Topic 2   Topic 17   Topic 24 Topic 66    
##  [1,] "parti"      "elect"    "govern"  "firm"     "south"  "minist"    
##  [2,] "polit"      "parti"    "nation"  "market"   "north"  "govern"    
##  [3,] "power"      "vote"     "peopl"   "industri" "korea"  "prime"     
##  [4,] "leader"     "voter"    "ethnic"  "product"  "asia"   "polit"     
##  [5,] "nation"     "poll"     "local"   "big"      "island" "leader"    
##  [6,] "politician" "polit"    "villag"  "new"      "region" "parliament"
##  [7,] "member"     "campaign" "one"     "profit"   "relat"  "parti"     
##  [8,] "congress"   "seat"     "just"    "busi"     "east"   "opposit"   
##  [9,] "support"    "win"      "group"   "sale"     "sea"    "elect"     
## [10,] "constitut"  "candid"   "countri" "compani"  "also"   "berlusconi"
##       Topic 21    Topic 95    Topic 11  Topic 39 
##  [1,] "attack"    "project"   "countri" "one"    
##  [2,] "kill"      "mine"      "world"   "world"  
##  [3,] "war"       "water"     "global"  "argu"   
##  [4,] "bomb"      "govern"    "america" "blog"   
##  [5,] "group"     "build"     "develop" "histori"
##  [6,] "terrorist" "river"     "rich"    "long"   
##  [7,] "dead"      "say"       "emerg"   "great"  
##  [8,] "violenc"   "construct" "intern"  "view"   
##  [9,] "terror"    "one"       "latin"   "point"  
## [10,] "drone"     "plan"      "accord"  "much"   
## [1] "Cluster 10"
## [1] "Top 10 Countries in Cluster"
##     Tanzania        Kenya       Rwanda       Uganda South Africa 
##           17           16           15           15           14 
##        China        Niger        India      Nigeria       Angola 
##           13           12           11           11           10 
## [1] "Average Number of Countries in Article"
## [1] 11.87
## [1] "Number of Articles in Cluster"
## [1] 273
##       Topic 60  Topic 49  Topic 46  Topic 35 Topic 13 Topic 2   Topic 92  
##  [1,] "africa"  "immigr"  "rebel"   "peopl"  "store"  "govern"  "forc"    
##  [2,] "ship"    "border"  "assad"   "mani"   "retail" "nation"  "armi"    
##  [3,] "african" "migrant" "regim"   "fire"   "shop"   "peopl"   "defenc"  
##  [4,] "port"    "mani"    "war"     "now"    "sale"   "ethnic"  "militari"
##  [5,] "region"  "peopl"   "govern"  "caus"   "chain"  "local"   "arm"     
##  [6,] "contain" "say"     "forc"    "need"   "sell"   "villag"  "secur"   
##  [7,] "intern"  "illeg"   "north"   "damag"  "buy"    "one"     "general" 
##  [8,] "world"   "year"    "arm"     "also"   "custom" "just"    "war"     
##  [9,] "dubai"   "countri" "group"   "help"   "can"    "group"   "soldier" 
## [10,] "countri" "issu"    "western" "miss"   "good"   "countri" "troop"   
##       Topic 14  Topic 95    Topic 63  
##  [1,] "economi" "project"   "fund"    
##  [2,] "growth"  "mine"      "investor"
##  [3,] "econom"  "water"     "return"  
##  [4,] "gdp"     "govern"    "invest"  
##  [5,] "invest"  "build"     "asset"   
##  [6,] "export"  "river"     "equiti"  
##  [7,] "account" "say"       "share"   
##  [8,] "product" "construct" "manag"   
##  [9,] "year"    "one"       "profit"  
## [10,] "busi"    "plan"      "money"
The easy clusters to understand are numbers 1 (Super Powers), 4 (Euro Zone), 5 (Mideast Conflict), 7 (Asia), and 10 (South America). Cluster 3 seems to be about Regional Politics around India. Clusters 2 and 6 have low average number of countries per article so they are intranational articles.

Saturday, January 18, 2014

Economist Year in Review: Part 1

Intro: A new a year means a new Year in Review to digest and synthesize the events. Here's my attempt creating a Data oriented Year in Review of The Economist Newspaper using all the articles from the past year.

Getting Data: I used Python to 'scrape' data from The Economist website. I've included the code, but not the actual articles. If you have access to the Economist then you too can use the code to download the articles from the past year.
LDA: First I wanted to know what kind of topics were written about so I made a topics model with 100 topics. Below are the top ten words of each topic in decreasing order of importance. In addition, some of topics displayed over time that I thought were interesting.
library(topicmodels)
load("/users/sweiss/google drive/ld100.Rda")
theta = posterior(ld.100)$topics
theta.average = colMeans(theta)
top.5.factors = names(sort(theta.average, decreasing = TRUE))
terms(ld.100, 10)[, as.numeric(top.5.factors)]
##       Topic 50    Topic 16   Topic 62    Topic 90  Topic 83    
##  [1,] "price"     "elect"    "scienc"    "bank"    "minist"    
##  [2,] "rate"      "parti"    "cell"      "financi" "govern"    
##  [3,] "economist" "vote"     "research"  "loan"    "polit"     
##  [4,] "index"     "polit"    "found"     "capit"   "prime"     
##  [5,] "market"    "voter"    "technolog" "lend"    "parti"     
##  [6,] "econom"    "poll"     "one"       "credit"  "parliament"
##  [7,] "exchang"   "seat"     "human"     "deposit" "opposit"   
##  [8,] "commod"    "candid"   "brain"     "financ"  "leader"    
##  [9,] "interest"  "win"      "look"      "asset"   "elect"     
## [10,] "job"       "campaign" "univers"   "regul"   "coalit"    
##       Topic 46     Topic 64  Topic 52    Topic 66     Topic 30    
##  [1,] "market"     "number"  "billion"   "presid"     "republican"
##  [2,] "firm"       "year"    "compani"   "polit"      "obama"     
##  [3,] "product"    "sinc"    "firm"      "elect"      "senat"     
##  [4,] "industri"   "increas" "share"     "power"      "democrat"  
##  [5,] "sale"       "rise"    "busi"      "presidenti" "congress"  
##  [6,] "manufactur" "rate"    "profit"    "year"       "hous"      
##  [7,] "make"       "fall"    "deal"      "govern"     "state"     
##  [8,] "profit"     "accord"  "buy"       "constitut"  "polit"     
##  [9,] "busi"       "quarter" "sharehold" "last"       "barack"    
## [10,] "cost"       "averag"  "stake"     "countri"    "presid"    
##       Topic 20  Topic 24  Topic 54   Topic 67 Topic 27  Topic 74 
##  [1,] "economi" "rate"    "group"    "syria"  "will"    "use"    
##  [2,] "growth"  "inflat"  "govern"   "rebel"  "may"     "system" 
##  [3,] "econom"  "bank"    "armi"     "war"    "year"    "can"    
##  [4,] "gdp"     "polici"  "peac"     "assad"  "hope"    "work"   
##  [5,] "invest"  "central" "pakistan" "regim"  "alreadi" "one"    
##  [6,] "year"    "currenc" "kill"     "weapon" "get"     "make"   
##  [7,] "account" "market"  "polit"    "iraq"   "next"    "way"    
##  [8,] "quarter" "fed"     "violenc"  "syrian" "expect"  "machin" 
##  [9,] "export"  "economi" "war"      "libya"  "think"   "design" 
## [10,] "spend"   "reserv"  "attack"   "forc"   "far"     "problem"
##       Topic 49  Topic 22     Topic 96   Topic 48   Topic 78   Topic 25 
##  [1,] "citi"    "parti"      "fund"     "firm"     "one"      "polic"  
##  [2,] "local"   "polit"      "investor" "compani"  "may"      "crime"  
##  [3,] "mayor"   "leader"     "return"   "busi"     "might"    "prison" 
##  [4,] "town"    "power"      "invest"   "consult"  "like"     "drug"   
##  [5,] "area"    "member"     "asset"    "industri" "way"      "say"    
##  [6,] "council" "politician" "equiti"   "big"      "can"      "crimin" 
##  [7,] "resid"   "call"       "manag"    "servic"   "whether"  "jail"   
##  [8,] "centr"   "nation"     "share"    "client"   "even"     "sentenc"
##  [9,] "street"  "offici"     "stock"    "work"     "question" "murder" 
## [10,] "build"   "chief"      "money"    "say"      "littl"    "gun"    
##       Topic 31   Topic 21   Topic 26   Topic 77    Topic 61 Topic 40
##  [1,] "euro"     "investig" "labour"   "mobil"     "year"   "year"  
##  [2,] "zone"     "charg"    "britain"  "technolog" "day"    "last"  
##  [3,] "spain"    "alleg"    "cameron"  "phone"     "week"   "two"   
##  [4,] "european" "claim"    "tori"     "appl"      "april"  "ago"   
##  [5,] "bail"     "case"     "parti"    "comput"    "month"  "five"  
##  [6,] "europ"    "trial"    "conserv"  "servic"    "last"   "past"  
##  [7,] "greec"    "report"   "miliband" "oper"      "may"    "month" 
##  [8,] "countri"  "scandal"  "mps"      "googl"     "time"   "four"  
##  [9,] "ireland"  "accus"    "david"    "network"   "said"   "three" 
## [10,] "cyprus"   "former"   "polit"    "softwar"   "june"   "end"   
##       Topic 33 Topic 5   Topic 11 Topic 23  Topic 15   Topic 80   Topic 29
##  [1,] "can"    "cut"     "court"  "money"   "britain"  "china"    "long"  
##  [2,] "make"   "spend"   "law"    "cost"    "british"  "chines"   "time"  
##  [3,] "world"  "budget"  "case"   "pay"     "london"   "offici"   "term"  
##  [4,] "peopl"  "billion" "legal"  "billion" "england"  "beij"     "like"  
##  [5,] "argu"   "year"    "right"  "fee"     "kingdom"  "hong"     "big"   
##  [6,] "think"  "deficit" "rule"   "year"    "unit"     "kong"     "run"   
##  [7,] "one"    "govern"  "judg"   "use"     "briton"   "shanghai" "also"  
##  [8,] "like"   "fiscal"  "lawyer" "paid"    "english"  "even"     "less"  
##  [9,] "good"   "will"    "suprem" "charg"   "scotland" "also"     "even"  
## [10,] "idea"   "tax"     "justic" "payment" "servic"   "yuan"     "well"  
##       Topic 65  Topic 98   Topic 34  Topic 56  Topic 1     Topic 35   
##  [1,] "measur"  "european" "law"     "histori" "space"     "road"     
##  [2,] "data"    "europ"    "rule"    "mani"    "one"       "line"     
##  [3,] "suggest" "union"    "regul"   "yet"     "mar"       "car"      
##  [4,] "also"    "countri"  "bill"    "war"     "scienc"    "train"    
##  [5,] "may"     "commiss"  "pass"    "day"     "technolog" "transport"
##  [6,] "can"     "nation"   "requir"  "never"   "field"     "rail"     
##  [7,] "chang"   "dutch"    "ban"     "father"  "light"     "railway"  
##  [8,] "base"    "brussel"  "new"     "centuri" "though"    "speed"    
##  [9,] "differ"  "germani"  "propos"  "old"     "earth"     "drive"    
## [10,] "point"   "franc"    "control" "much"    "orbit"     "say"      
##       Topic 73  Topic 12    Topic 28   Topic 2    Topic 82   Topic 58  
##  [1,] "will"    "project"   "protest"  "school"   "bond"     "women"   
##  [2,] "octob"   "build"     "street"   "educ"     "debt"     "children"
##  [3,] "month"   "plan"      "govern"   "student"  "rate"     "age"     
##  [4,] "next"    "will"      "call"     "univers"  "yield"    "famili"  
##  [5,] "novemb"  "water"     "polic"    "teacher"  "govern"   "young"   
##  [6,] "one"     "say"       "day"      "year"     "interest" "men"     
##  [7,] "year"    "river"     "demonstr" "colleg"   "market"   "old"     
##  [8,] "first"   "construct" "support"  "teach"    "investor" "parent"  
##  [9,] "said"    "built"     "peopl"    "children" "financ"   "sex"     
## [10,] "septemb" "billion"   "thousand" "pupil"    "borrow"   "child"   
##       Topic 86  Topic 70 Topic 97   Topic 19 Topic 95   Topic 4   
##  [1,] "say"     "reform" "polici"   "first"  "use"      "work"    
##  [2,] "mani"    "will"   "visit"    "second" "onlin"    "worker"  
##  [3,] "one"     "chang"  "also"     "back"   "data"     "job"     
##  [4,] "can"     "polici" "diplomat" "two"    "internet" "labour"  
##  [5,] "peopl"   "system" "leader"   "canada" "can"      "employ"  
##  [6,] "see"     "plan"   "presid"   "time"   "social"   "wage"    
##  [7,] "languag" "new"    "foreign"  "made"   "user"     "unemploy"
##  [8,] "still"   "need"   "two"      "place"  "applic"   "skill"   
##  [9,] "want"    "propos" "want"     "chang"  "search"   "young"   
## [10,] "main"    "polit"  "might"    "now"    "peopl"    "low"     
##       Topic 44  Topic 92  Topic 69     Topic 63 Topic 81   Topic 43
##  [1,] "countri" "time"    "state"      "chief"  "govern"   "peopl" 
##  [2,] "foreign" "test"    "unit"       "manag"  "state"    "kill"  
##  [3,] "world"   "ask"     "feder"      "boss"   "privat"   "fire"  
##  [4,] "global"  "experi"  "governor"   "execut" "public"   "mani"  
##  [5,] "intern"  "suggest" "california" "offic"  "sector"   "miss"  
##  [6,] "develop" "person"  "year"       "also"   "offici"   "now"   
##  [7,] "rich"    "control" "texa"       "job"    "own"      "bad"   
##  [8,] "emerg"   "word"    "san"        "board"  "say"      "least" 
##  [9,] "import"  "show"    "one"        "head"   "privatis" "disast"
## [10,] "abroad"  "think"   "counti"     "need"   "local"    "die"   
##       Topic 99 Topic 84       Topic 13   Topic 94     Topic 47   
##  [1,] "like"   "start"        "media"    "america"    "deal"     
##  [2,] "class"  "busi"         "televis"  "american"   "talk"     
##  [3,] "middl"  "firm"         "news"     "unit"       "negoti"   
##  [4,] "still"  "new"          "advertis" "state"      "agreement"
##  [5,] "well"   "entrepreneur" "newspap"  "washington" "two"      
##  [6,] "just"   "ventur"       "video"    "nation"     "agre"     
##  [7,] "much"   "small"        "show"     "long"       "sign"     
##  [8,] "better" "big"          "year"     "see"        "side"     
##  [9,] "half"   "founder"      "time"     "obama"      "want"     
## [10,] "part"   "capit"        "watch"    "action"     "will"     
##       Topic 88   Topic 89      Topic 75  Topic 18 Topic 59  Topic 45     
##  [1,] "secur"    "forc"        "africa"  "shop"   "poor"    "muslim"     
##  [2,] "agenc"    "militari"    "african" "retail" "peopl"   "egypt"      
##  [3,] "govern"   "defenc"      "south"   "store"  "help"    "islamist"   
##  [4,] "secret"   "armi"        "countri" "busi"   "poverti" "islam"      
##  [5,] "attack"   "arm"         "east"    "custom" "social"  "saudi"      
##  [6,] "say"      "war"         "kenya"   "sell"   "work"    "brother"    
##  [7,] "intellig" "afghanistan" "nigeria" "open"   "give"    "morsi"      
##  [8,] "offici"   "secur"       "middl"   "sale"   "incom"   "arab"       
##  [9,] "inform"   "soldier"     "region"  "onlin"  "benefit" "brotherhood"
## [10,] "spi"      "drone"       "say"     "say"    "money"   "arabia"     
##       Topic 32   Topic 10   Topic 36    Topic 9    Topic 71 Topic 7      
##  [1,] "hous"     "right"    "new"       "univers"  "music"  "climat"     
##  [2,] "price"    "group"    "york"      "paper"    "art"    "chang"      
##  [3,] "properti" "gay"      "one"       "research" "film"   "carbon"     
##  [4,] "home"     "campaign" "will"      "publish"  "one"    "warm"       
##  [5,] "land"     "marriag"  "year"      "public"   "show"   "environment"
##  [6,] "mortgag"  "support"  "old"       "book"     "man"    "tree"       
##  [7,] "rent"     "member"   "two"       "author"   "can"    "emiss"      
##  [8,] "new"      "among"    "bloomberg" "work"     "now"    "model"      
##  [9,] "valu"     "liber"    "boston"    "studi"    "cultur" "water"      
## [10,] "rise"     "debat"    "post"      "journal"  "first"  "global"     
##       Topic 8     Topic 14  Topic 79  Topic 87    Topic 51  Topic 42 
##  [1,] "asia"      "germani" "game"    "health"    "trade"   "immigr" 
##  [2,] "ship"      "german"  "sport"   "drug"      "market"  "mexico" 
##  [3,] "australia" "mrs"     "footbal" "hospit"    "exchang" "say"    
##  [4,] "island"    "merkel"  "club"    "care"      "import"  "migrant"
##  [5,] "south"     "green"   "play"    "treatment" "financi" "border" 
##  [6,] "sea"       "berlin"  "team"    "cancer"    "goldman" "year"   
##  [7,] "region"    "europ"   "world"   "patient"   "deriv"   "illeg"  
##  [8,] "port"      "social"  "player"  "doctor"    "econom"  "countri"
##  [9,] "countri"   "coalit"  "leagu"   "medic"     "world"   "home"   
## [10,] "indonesia" "left"    "can"     "also"      "financ"  "refuge" 
##       Topic 85 Topic 53   Topic 91  Topic 55 Topic 68     Topic 57
##  [1,] "food"   "power"    "oil"     "black"  "itali"      "tax"   
##  [2,] "say"    "energi"   "gas"     "mani"   "church"     "rais"  
##  [3,] "hotel"  "electr"   "billion" "white"  "italian"    "revenu"
##  [4,] "one"    "plant"    "energi"  "may"    "berlusconi" "pay"   
##  [5,] "meat"   "nuclear"  "price"   "like"   "left"       "incom" 
##  [6,] "drink"  "wind"     "reserv"  "race"   "christian"  "financ"
##  [7,] "eat"    "industri" "shale"   "make"   "one"        "rate"  
##  [8,] "good"   "generat"  "year"    "stop"   "cathol"     "govern"
##  [9,] "may"    "batteri"  "invest"  "less"   "movement"   "rich"  
## [10,] "consum" "renew"    "new"     "becom"  "hous"       "money" 
##       Topic 72  Topic 60   Topic 76     Topic 41  Topic 93    Topic 39 
##  [1,] "air"     "insur"    "brazil"     "pension" "french"    "russia" 
##  [2,] "new"     "health"   "year"       "pay"     "franc"     "russian"
##  [3,] "airport" "will"     "farmer"     "scheme"  "holland"   "putin"  
##  [4,] "fli"     "care"     "farm"       "public"  "pari"      "ukrain" 
##  [5,] "airlin"  "plan"     "latin"      "benefit" "fran"      "polit"  
##  [6,] "will"    "obamacar" "say"        "fund"    "mali"      "kremlin"
##  [7,] "flight"  "exchang"  "govern"     "will"    "left"      "moscow" 
##  [8,] "take"    "like"     "agricultur" "retir"   "socialist" "soviet" 
##  [9,] "dubai"   "cost"     "land"       "year"    "now"       "also"   
## [10,] "plane"   "mani"     "brazilian"  "detroit" "presid"    "now"    
##       Topic 100 Topic 37  Topic 38   Topic 3       Topic 6   Topic 17 
##  [1,] "mine"    "turkey"  "india"    "israel"      "japan"   "north"  
##  [2,] "make"    "erdogan" "indian"   "iran"        "abe"     "south"  
##  [3,] "gold"    "yet"     "say"      "isra"        "japanes" "korea"  
##  [4,] "world"   "say"     "state"    "palestinian" "new"     "park"   
##  [5,] "high"    "includ"  "year"     "arab"        "say"     "korean" 
##  [6,] "year"    "turkish" "delhi"    "west"        "tokyo"   "nuclear"
##  [7,] "wast"    "smoke"   "run"      "middl"       "minist"  "kim"    
##  [8,] "miner"   "may"     "congress" "east"        "countri" "regim"  
##  [9,] "steel"   "world"   "now"      "iranian"     "prime"   "state"  
## [10,] "littl"   "troubl"  "yet"      "state"       "now"     "test"
As one can see in the most common topics; economics, finance, business, politics top the list.
econ.3 = read.csv("/users/sweiss/google drive/economistdata.csv")
respons.vars = paste(1:100)
theta.econ = cbind(econ.3[, 1:3], theta)
date.theta.agg = aggregate(x = theta.econ[respons.vars], by = theta.econ["Date"], 
    FUN = mean)
date.theta.agg$Month <- factor(date.theta.agg$Date, levels = date.theta.agg$Date[!duplicated(date.theta.agg$Date)])
date.theta.agg[, 1] = as.Date(date.theta.agg[, 1])
library(scales)

library(ggplot2)
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 68])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Syrian Civil War' Topic include words: syria,rebel,war,assad,regim") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-2
Syrian Civil War 'peaked' in May/June. Since then the percentage of articles attributed to this topic has declined. After it became clear that USA, France, and UK were not going to intervene militarily, there were fewer articles written on the topic.
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 32])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Euro Crisis' Topic include words: euro,zone,spain,european,bail") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-3
Euro Crisis was with us the entire year…looks like this can continue on next year too!
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 29])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Protests' Topic include words: protest,street,govern,call,police") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-4
Protest topic was very high in June/July because of the Brazillian Protests and Turkish Protests. Sharp increase in final weeks of year of this topic from Thailand and even (gasp!) Singapore.
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 61])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Obama Care' Topic include words: insur,health,care,plan,obamaca") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-5
Obama Care topic increased in end of year due to botched introduction of healthcare website.
ggplot(data = date.theta.agg, aes(x = as.Date(Date), y = date.theta.agg[, 96])) + 
    geom_line() + geom_smooth() + scale_x_date(breaks = "months", labels = date_format("%b-%Y")) + 
    ggtitle("'Internet' Topic include words: use,online,data,internet,social") + 
    xlab("Date (By Month)") + ylab("Average Percentage of Articles Attributed to Factor")
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
plot of chunk unnamed-chunk-6
Finally, the Internet topic showed a strong decline ovoer the past year despite events like Twitter IPO. Perhaps this is an indication that social networks and internet are becoming so ubiquitious, its simply not news anymore.