Saturday, January 25, 2014

Economist Year in Review: Part 2

Economist Year in Review: Part 2

I like how The Economist focuses on Geo Political issues and I want to capture that in my analysis. I therefore labelled each article by the countries/demonyns included in the text (If either “United States” or “American” was included in text then it would label that particular article “United States”. Of course, an article can be labelled many countries).
I then removed all instances of Country Names from the text. I did so because I wanted to use LDA again, and having country names in the text would create a dependancy I don't want.
So, for this part of analysis I have two matrices. One involves LDA matrix of 3440 articles by 100 topics that describes each article as a percentage of 100 topics (with the country names and demonyms removed from the text). The second matrix is 3440 by 193 which represents each article and which countries are mentioned in the text of the article.
The question I want to address is; what are the main groups of international affairs? For example; we expect Syria, Iran, and US to be in a cluster and China, Japan, and US to a cluster as well.
To do this, I used K-means clustering on the articles by countries matrix to group articles based on the countries included in the text. Then to understand what these groups discuss, I averaged the topics of each classified group.
Lets just see how many times each country is mentioned at least once in an article
load("/Users/sweiss/Google Drive/countrymatrix.rdata")
country.sums = colSums(country.mat)
barplot(sort(country.sums, decreasing = TRUE)[1:50], las = 3, cex.names = 0.7, 
    ylab = "Number of Times a country name or Demonym occured at least once in an article", 
    main = "Number of times a Country was Identified in an article (Economist 2013)")
plot of chunk unnamed-chunk-1
USA is number 1 with UK in a distance 2nd. Obviously there's some UK bias because this is an English newspaper.
Below are the clusters of articles based on the countries named and Topics most associated with those clusters. I chose 10 clusters using the elbow graph method (not shown).
library(topicmodels)
load("/users/sweiss/google drive/economistnocountryld100.Rda")
theta = posterior(ld.100)$topics
theta.average = colMeans(theta)
top.5.factors = names(sort(theta.average, decreasing = TRUE))
load("/Users/sweiss/Google Drive/countrymatrix.rdata")

kmeans.10 <- kmeans(x = country.mat, centers = 10)


cluster.10 = kmeans.10$cluster
for (i in 1:10) {
    theta.average.cluster.1 = colMeans(theta[which(cluster.10 == i), ])
    top.5.factors.cluster.1 = names(sort(theta.average.cluster.1, decreasing = TRUE))
    print(paste("Cluster", i))
    print("Top 10 Countries in Cluster")
    print(sort(colSums(country.mat[which(cluster.10 == i), ]), decreasing = TRUE)[1:10])
    print("Average Number of Countries in Article")
    print(mean(rowSums(country.mat[which(cluster.10 == i), ])))
    print("Number of Articles in Cluster")
    print(sum(rowSums(country.mat[which(cluster.10 == i), ])))
    print(terms(ld.100, 10)[, as.numeric(top.5.factors.cluster.1)[1:10]])
}
## [1] "Cluster 1"
## [1] "Top 10 Countries in Cluster"
## United Kingdom         France          Japan         Russia          Italy 
##            254             73             60             46             44 
##          Spain         Israel         Turkey    Netherlands         Canada 
##             34             33             33             32             31 
## [1] "Average Number of Countries in Article"
## [1] 1.224
## [1] "Number of Articles in Cluster"
## [1] 1567
##       Topic 33    Topic 19   Topic 97   Topic 79   Topic 66    
##  [1,] "rate"      "elect"    "cell"     "labour"   "minist"    
##  [2,] "price"     "parti"    "research" "cameron"  "govern"    
##  [3,] "economist" "vote"     "scienc"   "tori"     "prime"     
##  [4,] "interest"  "voter"    "work"     "parti"    "polit"     
##  [5,] "index"     "poll"     "human"    "britain"  "leader"    
##  [6,] "market"    "polit"    "brain"    "conserv"  "parliament"
##  [7,] "trade"     "campaign" "one"      "polit"    "parti"     
##  [8,] "commod"    "seat"     "cancer"   "miliband" "opposit"   
##  [9,] "job"       "win"      "might"    "mps"      "elect"     
## [10,] "balanc"    "candid"   "use"      "david"    "berlusconi"
##       Topic 40     Topic 16  Topic 26   Topic 58  Topic 53 
##  [1,] "presid"     "polic"   "protest"  "test"    "local"  
##  [2,] "polit"      "crime"   "govern"   "time"    "town"   
##  [3,] "elect"      "prison"  "street"   "ask"     "park"   
##  [4,] "power"      "crimin"  "erdogan"  "peopl"   "build"  
##  [5,] "leader"     "say"     "polic"    "experi"  "council"
##  [6,] "presidenti" "sentenc" "support"  "think"   "centr"  
##  [7,] "constitut"  "drug"    "call"     "word"    "place"  
##  [8,] "countri"    "murder"  "demonstr" "suggest" "peopl"  
##  [9,] "year"       "peopl"   "polit"    "person"  "plan"   
## [10,] "run"        "jail"    "opposit"  "relat"   "new"    
## [1] "Cluster 2"
## [1] "Top 10 Countries in Cluster"
##         France        Germany          Italy          Spain United Kingdom 
##            128            118            116            103             58 
##  United States         Greece    Netherlands        Ireland         Russia 
##             55             53             52             38             32 
## [1] "Average Number of Countries in Article"
## [1] 8.138
## [1] "Number of Articles in Cluster"
## [1] 1237
##       Topic 9    Topic 28   Topic 88  Topic 62   Topic 83  Topic 14 
##  [1,] "euro"     "european" "bank"    "bond"     "price"   "economi"
##  [2,] "zone"     "europ"    "financi" "debt"     "market"  "growth" 
##  [3,] "european" "union"    "loan"    "rate"     "cost"    "econom" 
##  [4,] "countri"  "commiss"  "lend"    "yield"    "rise"    "gdp"    
##  [5,] "bank"     "countri"  "capit"   "investor" "year"    "invest" 
##  [6,] "crisi"    "nation"   "deposit" "govern"   "demand"  "export" 
##  [7,] "bail"     "want"     "credit"  "interest" "low"     "account"
##  [8,] "imf"      "brussel"  "crisi"   "market"   "increas" "product"
##  [9,] "market"   "treati"   "borrow"  "borrow"   "like"    "year"   
## [10,] "economi"  "like"     "asset"   "default"  "high"    "busi"   
##       Topic 29   Topic 17   Topic 91   Topic 52   
##  [1,] "worker"   "firm"     "mrs"      "left"     
##  [2,] "job"      "market"   "merkel"   "holland"  
##  [3,] "work"     "industri" "left"     "right"    
##  [4,] "labour"   "product"  "parti"    "presid"   
##  [5,] "employ"   "big"      "coalit"   "polit"    
##  [6,] "wage"     "new"      "green"    "socialist"
##  [7,] "unemploy" "profit"   "centr"    "now"      
##  [8,] "skill"    "busi"     "govern"   "put"      
##  [9,] "pay"      "sale"     "democrat" "yet"      
## [10,] "factori"  "compani"  "social"   "fran"     
## [1] "Cluster 3"
## [1] "Top 10 Countries in Cluster"
##  United States United Kingdom         Canada         France         Russia 
##            686            115             49             46             37 
##          Japan    Afghanistan      Australia          Spain        Georgia 
##             34             27             24             23             21 
## [1] "Average Number of Countries in Article"
## [1] 2.257
## [1] "Number of Articles in Cluster"
## [1] 1548
##       Topic 87     Topic 72 Topic 19   Topic 90   Topic 63   Topic 47    
##  [1,] "republican" "court"  "elect"    "insur"    "fund"     "state"     
##  [2,] "obama"      "law"    "parti"    "health"   "investor" "feder"     
##  [3,] "democrat"   "case"   "vote"     "will"     "return"   "california"
##  [4,] "senat"      "legal"  "voter"    "plan"     "invest"   "governor"  
##  [5,] "polit"      "judg"   "poll"     "care"     "asset"    "year"      
##  [6,] "congress"   "rule"   "polit"    "feder"    "equiti"   "say"       
##  [7,] "hous"       "right"  "campaign" "obamacar" "share"    "one"       
##  [8,] "bill"       "lawyer" "seat"     "mani"     "manag"    "texa"      
##  [9,] "america"    "suprem" "win"      "exchang"  "profit"   "back"      
## [10,] "barack"     "justic" "candid"   "state"    "money"    "also"      
##       Topic 97   Topic 64    Topic 62   Topic 88 
##  [1,] "cell"     "compani"   "bond"     "bank"   
##  [2,] "research" "firm"      "debt"     "financi"
##  [3,] "scienc"   "busi"      "rate"     "loan"   
##  [4,] "work"     "deal"      "yield"    "lend"   
##  [5,] "human"    "share"     "investor" "capit"  
##  [6,] "brain"    "billion"   "govern"   "deposit"
##  [7,] "one"      "sharehold" "interest" "credit" 
##  [8,] "cancer"   "buy"       "market"   "crisi"  
##  [9,] "might"    "stake"     "borrow"   "borrow" 
## [10,] "use"      "privat"    "default"  "asset"  
## [1] "Cluster 4"
## [1] "Top 10 Countries in Cluster"
##        Germany  United States United Kingdom         France          China 
##            278            112             94             70             44 
##    Netherlands    Switzerland         Russia          Japan         Greece 
##             32             29             27             26             21 
## [1] "Average Number of Countries in Article"
## [1] 4.324
## [1] "Number of Articles in Cluster"
## [1] 1202
##       Topic 91   Topic 28   Topic 9    Topic 88  Topic 17   Topic 64   
##  [1,] "mrs"      "european" "euro"     "bank"    "firm"     "compani"  
##  [2,] "merkel"   "europ"    "zone"     "financi" "market"   "firm"     
##  [3,] "left"     "union"    "european" "loan"    "industri" "busi"     
##  [4,] "parti"    "commiss"  "countri"  "lend"    "product"  "deal"     
##  [5,] "coalit"   "countri"  "bank"     "capit"   "big"      "share"    
##  [6,] "green"    "nation"   "crisi"    "deposit" "new"      "billion"  
##  [7,] "centr"    "want"     "bail"     "credit"  "profit"   "sharehold"
##  [8,] "govern"   "brussel"  "imf"      "crisi"   "busi"     "buy"      
##  [9,] "democrat" "treati"   "market"   "borrow"  "sale"     "stake"    
## [10,] "social"   "like"     "economi"  "asset"   "compani"  "privat"   
##       Topic 76  Topic 66     Topic 29   Topic 15  
##  [1,] "billion" "minist"     "worker"   "investig"
##  [2,] "year"    "govern"     "job"      "claim"   
##  [3,] "will"    "prime"      "work"     "case"    
##  [4,] "cost"    "polit"      "labour"   "alleg"   
##  [5,] "also"    "leader"     "employ"   "charg"   
##  [6,] "worth"   "parliament" "wage"     "report"  
##  [7,] "total"   "parti"      "unemploy" "trial"   
##  [8,] "last"    "opposit"    "skill"    "said"    
##  [9,] "make"    "elect"      "pay"      "former"  
## [10,] "estim"   "berlusconi" "factori"  "accus"   
## [1] "Cluster 5"
## [1] "Top 10 Countries in Cluster"
##         Syria United States          Iraq          Iran        Israel 
##           121            88            80            73            66 
##        Turkey       Lebanon        Russia         Egypt  Saudi Arabia 
##            57            42            40            38            37 
## [1] "Average Number of Countries in Article"
## [1] 7.39
## [1] "Number of Articles in Cluster"
## [1] 1005
##       Topic 46  Topic 48     Topic 81      Topic 92   Topic 21   
##  [1,] "rebel"   "america"    "muslim"      "forc"     "attack"   
##  [2,] "assad"   "obama"      "ian"         "armi"     "kill"     
##  [3,] "regim"   "presid"     "islam"       "defenc"   "war"      
##  [4,] "war"     "nuclear"    "islamist"    "militari" "bomb"     
##  [5,] "govern"  "polici"     "brother"     "arm"      "group"    
##  [6,] "forc"    "relat"      "now"         "secur"    "terrorist"
##  [7,] "north"   "intern"     "morsi"       "general"  "dead"     
##  [8,] "arm"     "weapon"     "brotherhood" "war"      "violenc"  
##  [9,] "group"   "washington" "back"        "soldier"  "terror"   
## [10,] "western" "foreign"    "includ"      "troop"    "drone"    
##       Topic 89      Topic 26   Topic 66     Topic 37    Topic 7  
##  [1,] "white"       "protest"  "minist"     "deal"      "polit"  
##  [2,] "black"       "govern"   "govern"     "trade"     "putin"  
##  [3,] "palestinian" "street"   "prime"      "talk"      "anti"   
##  [4,] "king"        "erdogan"  "polit"      "negoti"    "now"    
##  [5,] "relat"       "polic"    "leader"     "agreement" "also"   
##  [6,] "arab"        "support"  "parliament" "agre"      "may"    
##  [7,] "say"         "call"     "parti"      "side"      "kremlin"
##  [8,] "west"        "demonstr" "opposit"    "free"      "soviet" 
##  [9,] "state"       "polit"    "elect"      "sign"      "western"
## [10,] "two"         "opposit"  "berlusconi" "two"       "power"  
## [1] "Cluster 6"
## [1] "Top 10 Countries in Cluster"
##          Niger        Nigeria  United States         France United Kingdom 
##             55             52             20             14             13 
##           Mali   South Africa          China          Ghana       Cameroon 
##             12             12             10              9              7 
## [1] "Average Number of Countries in Article"
## [1] 7.182
## [1] "Number of Articles in Cluster"
## [1] 395
##       Topic 60  Topic 46  Topic 21    Topic 22  Topic 40     Topic 5     
##  [1,] "africa"  "rebel"   "attack"    "money"   "presid"     "food"      
##  [2,] "ship"    "assad"   "kill"      "pay"     "polit"      "farm"      
##  [3,] "african" "regim"   "war"       "servic"  "elect"      "farmer"    
##  [4,] "port"    "war"     "bomb"      "save"    "power"      "product"   
##  [5,] "region"  "govern"  "group"     "charg"   "leader"     "say"       
##  [6,] "contain" "forc"    "terrorist" "cost"    "presidenti" "produc"    
##  [7,] "intern"  "north"   "dead"      "card"    "constitut"  "agricultur"
##  [8,] "world"   "arm"     "violenc"   "fee"     "countri"    "meat"      
##  [9,] "dubai"   "group"   "terror"    "payment" "year"       "rice"      
## [10,] "countri" "western" "drone"     "account" "run"        "year"      
##       Topic 92   Topic 88  Topic 49  Topic 81     
##  [1,] "forc"     "bank"    "immigr"  "muslim"     
##  [2,] "armi"     "financi" "border"  "ian"        
##  [3,] "defenc"   "loan"    "migrant" "islam"      
##  [4,] "militari" "lend"    "mani"    "islamist"   
##  [5,] "arm"      "capit"   "peopl"   "brother"    
##  [6,] "secur"    "deposit" "say"     "now"        
##  [7,] "general"  "credit"  "illeg"   "morsi"      
##  [8,] "war"      "crisi"   "year"    "brotherhood"
##  [9,] "soldier"  "borrow"  "countri" "back"       
## [10,] "troop"    "asset"   "issu"    "includ"     
## [1] "Cluster 7"
## [1] "Top 10 Countries in Cluster"
##          China  United States          Japan United Kingdom         Russia 
##            409            187             91             60             40 
##         France      Australia         Canada        Vietnam         Taiwan 
##             39             29             24             24             23 
## [1] "Average Number of Countries in Article"
## [1] 3.399
## [1] "Number of Articles in Cluster"
## [1] 1390
##       Topic 85    Topic 24 Topic 17   Topic 95    Topic 4      Topic 14 
##  [1,] "offici"    "south"  "firm"     "project"   "parti"      "economi"
##  [2,] "beij"      "north"  "market"   "mine"      "polit"      "growth" 
##  [3,] "said"      "korea"  "industri" "water"     "power"      "econom" 
##  [4,] "recent"    "asia"   "product"  "govern"    "leader"     "gdp"    
##  [5,] "report"    "island" "big"      "build"     "nation"     "invest" 
##  [6,] "one"       "region" "new"      "river"     "politician" "export" 
##  [7,] "govern"    "relat"  "profit"   "say"       "member"     "account"
##  [8,] "ministri"  "east"   "busi"     "construct" "congress"   "product"
##  [9,] "public"    "sea"    "sale"     "one"       "support"    "year"   
## [10,] "communist" "also"   "compani"  "plan"      "constitut"  "busi"   
##       Topic 86 Topic 88  Topic 48     Topic 74 
##  [1,] "open"   "bank"    "america"    "foreign"
##  [2,] "also"   "financi" "obama"      "govern" 
##  [3,] "will"   "loan"    "presid"     "countri"
##  [4,] "mani"   "lend"    "nuclear"    "local"  
##  [5,] "hong"   "capit"   "polici"     "make"   
##  [6,] "can"    "deposit" "relat"      "control"
##  [7,] "anoth"  "credit"  "intern"     "abroad" 
##  [8,] "kong"   "crisi"   "weapon"     "intern" 
##  [9,] "argu"   "borrow"  "washington" "last"   
## [10,] "one"    "asset"   "foreign"    "may"    
## [1] "Cluster 8"
## [1] "Top 10 Countries in Cluster"
##  United States         Brazil         Mexico          Spain          Chile 
##            124             80             78             23             21 
##          China      Venezuela      Argentina United Kingdom       Colombia 
##             18             16             15             15             14 
## [1] "Average Number of Countries in Article"
## [1] 5.233
## [1] "Number of Articles in Cluster"
## [1] 675
##       Topic 11  Topic 40     Topic 17   Topic 16  Topic 87     Topic 82
##  [1,] "countri" "presid"     "firm"     "polic"   "republican" "reform"
##  [2,] "world"   "polit"      "market"   "crime"   "obama"      "govern"
##  [3,] "global"  "elect"      "industri" "prison"  "democrat"   "will"  
##  [4,] "america" "power"      "product"  "crimin"  "senat"      "polici"
##  [5,] "develop" "leader"     "big"      "say"     "polit"      "chang" 
##  [6,] "rich"    "presidenti" "new"      "sentenc" "congress"   "system"
##  [7,] "emerg"   "constitut"  "profit"   "drug"    "hous"       "plan"  
##  [8,] "intern"  "countri"    "busi"     "murder"  "bill"       "polit" 
##  [9,] "latin"   "year"       "sale"     "peopl"   "america"    "public"
## [10,] "accord"  "run"        "compani"  "jail"    "barack"     "need"  
##       Topic 14  Topic 37    Topic 75  Topic 47    
##  [1,] "economi" "deal"      "number"  "state"     
##  [2,] "growth"  "trade"     "america" "feder"     
##  [3,] "econom"  "talk"      "sinc"    "california"
##  [4,] "gdp"     "negoti"    "time"    "governor"  
##  [5,] "invest"  "agreement" "less"    "year"      
##  [6,] "export"  "agre"      "rate"    "say"       
##  [7,] "account" "side"      "year"    "one"       
##  [8,] "product" "free"      "declin"  "texa"      
##  [9,] "year"    "sign"      "rise"    "back"      
## [10,] "busi"    "two"       "increas" "also"      
## [1] "Cluster 9"
## [1] "Top 10 Countries in Cluster"
##          India          China  United States United Kingdom          Japan 
##            292            123            114             78             51 
##       Pakistan         Russia         Brazil      Australia      Indonesia 
##             43             39             33             32             31 
## [1] "Average Number of Countries in Article"
## [1] 5.014
## [1] "Number of Articles in Cluster"
## [1] 1464
##       Topic 4      Topic 19   Topic 2   Topic 17   Topic 24 Topic 66    
##  [1,] "parti"      "elect"    "govern"  "firm"     "south"  "minist"    
##  [2,] "polit"      "parti"    "nation"  "market"   "north"  "govern"    
##  [3,] "power"      "vote"     "peopl"   "industri" "korea"  "prime"     
##  [4,] "leader"     "voter"    "ethnic"  "product"  "asia"   "polit"     
##  [5,] "nation"     "poll"     "local"   "big"      "island" "leader"    
##  [6,] "politician" "polit"    "villag"  "new"      "region" "parliament"
##  [7,] "member"     "campaign" "one"     "profit"   "relat"  "parti"     
##  [8,] "congress"   "seat"     "just"    "busi"     "east"   "opposit"   
##  [9,] "support"    "win"      "group"   "sale"     "sea"    "elect"     
## [10,] "constitut"  "candid"   "countri" "compani"  "also"   "berlusconi"
##       Topic 21    Topic 95    Topic 11  Topic 39 
##  [1,] "attack"    "project"   "countri" "one"    
##  [2,] "kill"      "mine"      "world"   "world"  
##  [3,] "war"       "water"     "global"  "argu"   
##  [4,] "bomb"      "govern"    "america" "blog"   
##  [5,] "group"     "build"     "develop" "histori"
##  [6,] "terrorist" "river"     "rich"    "long"   
##  [7,] "dead"      "say"       "emerg"   "great"  
##  [8,] "violenc"   "construct" "intern"  "view"   
##  [9,] "terror"    "one"       "latin"   "point"  
## [10,] "drone"     "plan"      "accord"  "much"   
## [1] "Cluster 10"
## [1] "Top 10 Countries in Cluster"
##     Tanzania        Kenya       Rwanda       Uganda South Africa 
##           17           16           15           15           14 
##        China        Niger        India      Nigeria       Angola 
##           13           12           11           11           10 
## [1] "Average Number of Countries in Article"
## [1] 11.87
## [1] "Number of Articles in Cluster"
## [1] 273
##       Topic 60  Topic 49  Topic 46  Topic 35 Topic 13 Topic 2   Topic 92  
##  [1,] "africa"  "immigr"  "rebel"   "peopl"  "store"  "govern"  "forc"    
##  [2,] "ship"    "border"  "assad"   "mani"   "retail" "nation"  "armi"    
##  [3,] "african" "migrant" "regim"   "fire"   "shop"   "peopl"   "defenc"  
##  [4,] "port"    "mani"    "war"     "now"    "sale"   "ethnic"  "militari"
##  [5,] "region"  "peopl"   "govern"  "caus"   "chain"  "local"   "arm"     
##  [6,] "contain" "say"     "forc"    "need"   "sell"   "villag"  "secur"   
##  [7,] "intern"  "illeg"   "north"   "damag"  "buy"    "one"     "general" 
##  [8,] "world"   "year"    "arm"     "also"   "custom" "just"    "war"     
##  [9,] "dubai"   "countri" "group"   "help"   "can"    "group"   "soldier" 
## [10,] "countri" "issu"    "western" "miss"   "good"   "countri" "troop"   
##       Topic 14  Topic 95    Topic 63  
##  [1,] "economi" "project"   "fund"    
##  [2,] "growth"  "mine"      "investor"
##  [3,] "econom"  "water"     "return"  
##  [4,] "gdp"     "govern"    "invest"  
##  [5,] "invest"  "build"     "asset"   
##  [6,] "export"  "river"     "equiti"  
##  [7,] "account" "say"       "share"   
##  [8,] "product" "construct" "manag"   
##  [9,] "year"    "one"       "profit"  
## [10,] "busi"    "plan"      "money"
The easy clusters to understand are numbers 1 (Super Powers), 4 (Euro Zone), 5 (Mideast Conflict), 7 (Asia), and 10 (South America). Cluster 3 seems to be about Regional Politics around India. Clusters 2 and 6 have low average number of countries per article so they are intranational articles.

No comments:

Post a Comment