Tuesday, October 7, 2014

Predicting Monthly Car Sales: The Residuals are the Story

I'll produce predictions for US car sales by manufacture every month. There are already several blogs that describe the industry and sales that do a great job. Autoblog by the Numbers and Counting Cars are some to mention. 

Unlike their analysis, I'll try to focus on the residuals (the stuff I can't predict) to tell the story. To highlight the difference, I think its instructive to look at what Autoblog mentioned.

The Autoblog article (link above) highlights Mitsubishi for increasing sales. However, my prediction for Mitsubishi sales are pretty much exactly what the sales were. In essence, given this model, we didn't learn much. On the other hand, Land Rover and Jaguar had the largest residuals (in percent terms) and Land Rover and Acura had the largest deviance (Residual / Variance). I think these results are more telling because we didn't predict them correctly; something might have changed. 

I'll publish this on a new blog: datAutomotive.

Here is a  Shiny App For Car Sales and below are graphs / tables of my current analysis and future predictions. 







Predicted Values 9/14Actual Values 9/14log(Predicted/Actual)Deviance
Acura488.036576.3330.1661.731
Audi632.252621.542-0.0170.326
BMW1,061.3291,066.0830.0040.032
Buick649.648727.7500.1140.571
Cadillac568.021576.2080.0140.089
Chevrolet6,284.5946,411.3750.0200.135
Chrysler1,058.3921,199.2080.1250.710
Dodge1,892.4651,834.167-0.0310.172
Ford7,374.6697,177.542-0.0270.285
GMC1,602.2601,594.542-0.0050.029
Honda4,434.7804,349.625-0.0190.174
Hyundai2,291.3062,333.7500.0180.271
Infiniti315.217326.5420.0350.234
Jaguar37.83447.5830.2290.908
Jeep2,491.3782,301.292-0.0790.853
Kia1,853.7341,692.833-0.0911.115
Land.Rover159.975129.417-0.2121.816
Lexus1,011.967910.500-0.1061.189
Lincoln285.903302.3750.0560.394
Mazda1,074.272999.167-0.0720.743
Mercedes.Benz1,186.2201,230.1250.0360.364
Mini196.975175.792-0.1140.626
Mitsubishi231.582231.5830.000000.00002
Nissan4,176.9953,963.250-0.0530.516
Porsche156.496150.292-0.0400.408
Subaru1,727.5971,729.8750.0010.026
Toyota6,928.4726,059.458-0.1341.309
Volkswagen1,123.4121,083.167-0.0360.407
Volvo161.531194.4580.1861.070




Predicted Values for 10/14
Acura521.878
Audi590.125
BMW1,051.492
Buick666.165
Cadillac533.468
Chevrolet5,693.145
Chrysler1,097.288
Dodge1,493.426
Ford6,694.012
GMC1,572.525
Honda4,049.582
Hyundai1,998.896
Infiniti262.475
Jaguar40.422
Jeep2,132.652
Kia1,650.236
Land.Rover160.884
Lexus944.946
Lincoln272.964
Mazda894.234
Mercedes.Benz1,184.991
Mini183.743
Mitsubishi193.982
Nissan3,629.955
Porsche158.540
Subaru1,694.661
Toyota5,895.316
Volkswagen923.242
Volvo153.691

Wednesday, August 6, 2014

Predicting Monthly Car Sales for Brands in US: First Step

I've set out to produce monthly forecasts of monthly car sales by brand in the US. So far I've made a SUTSE dynamic linear model (code on Github) and created a Shiny app (http://sweiss.shinyapps.io/carvis/) as a prototype (no predictions yet).

Basically, I want to combine my interest in cars with my interest in stats.

My current roadblock is the how slow the code runs, but I intend to have a speedy implementation using RStan by next month.

If anyone has tips or suggestions, please comment!






Predictions for September:
Acura & 13764.75 \\
  Audi & 15929.24 \\
  BMW & 26445.80 \\
  Buick & 23725.69 \\
  Cadillac & 16722.01 \\
  Chevrolet & 204636.69 \\
  Chrysler & 26336.27 \\
  Dodge & 51082.20 \\
  Fiat & 4408.64 \\
  Ford & 228880.56 \\
  GMC & 50361.17 \\
  Honda & 143487.90 \\
  Hyundai & 67112.10 \\
  Infiniti & 10366.93 \\
  Jaguar & 1162.72 \\
  Jeep & 60577.05 \\
  Kia & 53863.38 \\
  Land.Rover & 4736.08 \\
  Lexus & 30614.06 \\
  Lincoln & 7766.23 \\
  Mazda & 29544.19 \\
  Mercedes.Benz & 28693.43 \\
  Mini & 5455.48 \\
  Mitsubishi & 6706.69 \\
  Nissan & 125118.79 \\
  Porsche & 3914.65 \\
  Ram & 39194.54 \\
  Smart & 1258.65 \\
  Subaru & 46186.21 \\
  Toyota & 213761.40 \\
  Volkswagen & 31581.44 \\
  Volvo & 4809.14 \\






Wednesday, July 23, 2014

Mideast Graph 3: Slate Middle East Friendship

Slate recently published a great info-graphic about Middle-East Relationships. It shows the relationships (Friend, Enemy, or Complicated) of 13 countries / organizations and the relationships between each pair. One draw back of the chart is that it doesn't show if countries / organizations cluster among each other. This post will attempt to rectify that drawback. 

From a previous post, I used an matrix decomposition of the laplacian matrix and plotted the smallest two eigenvectors against each other. The graphs below follow the same methodology and you can find code here. To give weights to the network matrix, I assigned values of 1, -1, and 0 for Friend, Enemy, and Complicated. 

Below is the resulting graph with red lines indicating Enemy and blue lines indicated Friends.  




The above graph shows 3 possible clusters. Turkey, Palestinian Authority, and Hamas are close to each other in lower left corner. In top right there is United States, Egypt, and Israel. Lower Left shows Iraq, Syria, Hezbollah and Iran.  

Since both ISIS and Al-Qaida have no Friend relationships and the goal of this was to see if there were clusters of friends / enemies I removed these organizations and redid analysis. Below is the result.


Here we can see much clearer distinction of relationship clusters. United States, Israel, and Egypt form one with Saudi Arabia as possibly a 'member' of that group. Iraq, Syria, Hezbollah and Iran have a defined cluster, which makes sense as they are all dominated by Shias. Finally, there is Turkey, Palestinian Authority and Hamas. I was surprised to see Turkey so close with Palestinian Authority and Hamas. But seeing as Turkey has been a supporter of Palestinians now and in past, maybe I shouldn't be. Also, surprising that there are no Arab countries close with Palestinians now.

The Mideast Conflict is sort of a 3 way standoff with Shias in one corner, Sunnis friendly to US / Israel in another, and Turkey trying to gain influence in a region it once controlled in the final corner.


Github

Thursday, May 22, 2014

didYouMean() Function: Using Google to correct errors in Strings

A function that will take a String as an input and return the "Did you mean.." or "Showing Results for.." from google.com. Good for misspelled names or locations.


library(RCurl)
##if on windows might need: options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
didYouMean=function(input){
  input=gsub(" ", "+", input)
  doc=getURL(paste("https://www.google.com/search?q=",input,"/", sep=""))
  
  
  dym=gregexpr(pattern ='Did you mean',doc)
  srf=gregexpr(pattern ='Showing results for',doc)
  
  
  if(length(dym[[1]])>1){
    doc2=substring(doc,dym[[1]][1],dym[[1]][1]+1000)
    s1=gregexpr("?q=",doc2)
    s2=gregexpr("/&",doc2)
    new.text=substring(doc2,s1[[1]][1]+2,s2[[1]][1]-1)
    return(gsub("[+]"," ",new.text))
    break
  }
  
  else if(srf[[1]][1]!=-1){
    doc2=substring(doc,srf[[1]][1],srf[[1]][1]+1000)
    s1=gregexpr("?q=",doc2)
    s2=gregexpr("/&",doc2)
    new.text=substring(doc2,s1[[1]][1]+2,s2[[1]][1]-1)
    return(gsub("[+]"," ",new.text))
    break
  }
  else(return(gsub("[+]"," ",input)))
}  

So didYouMean("gorecge washington") returns "george washington"


Works well with misspelled companies or nouns or phrases. For example; you're doing text analysis on twitter and a customer raves about Carlsburg beer. Only problem is he's enjoying their product while tweeting (something that happens only rarely, I'm sure) and wrote "clarsburg gprou". Not to worry!

> didYouMean("clarsburg gprou")
[1] "carlsberg group"

Or suppose you have a 3 phase plan for profits. This can help you get there!

didYouMean("clletc nuderpants")
[1] "collect underpants"

Saturday, May 17, 2014

Modelling This Time is Different: Corrected

New PDF
New Code


I made two errors in my previous post.

The first is that I put Probability in the utility function. Generally, this is a no no where the E[utility]=sum over i: P(outcome i)*U(outcome i). I therefore changed it to a more simple maximization problem (no Lagrangian multipliers necessary) where the individual maximizes E[Profits].

 The second problem has to deal with maximizing subject to probabilities. Since I sampled from the joint posterior distribution of unknown parameters, I had a number of draws from the distribution. What I did in the previous analysis was maximize each pair of simulated draws individually, and then averaged over these maximized results to get what I thought was the optimal result. In general, this method does not result in the optimal value. I should have maximized all pairs simultaneously. Basically I did E[max s of f(s,p)] instead of max s of E[f(s,a)].

In general, the results are superficially similar to my original analysis. Even if the results are largely the same, its best to describe my mistakes upfront, and avoid awkward questions later. 

Monday, April 14, 2014

Modeling "This Time is Different" in R

***WHOOPS Made some mistakes in analysis. Update HERE***


Check out PDF for Equations
And Code to run yourself

Introduction:
“This Time is different” by Reinhart and Rogoff is an empirical history of financial crises and panics.  It describes many economic events such as; inflation, bubbles, and defaults.

The theme that ties these events together is the idea that people have an unrealistic expectation of the future because they believe “This time is different”. With this new belief, they act in a way as to cause the next crisis.

While the book dealt with the empirical investigation of the matter, it didn’t describe the theme itself in intricate detail. This post will attempt to fill that void by giving a mathematical interpretation of the theme.

There are 3 aspects of the model I’d like to capture,
1.     prior beliefs act guide and tell us how to make decisions
2.     decisions have an effect on the future
3.     people update our beliefs when new information becomes available

Economic Maximization Problem:
I came up with a simple driving example that describes these properties.  Suppose we’d like to maximize the speed of a car and minimize the probability of a crash (or maximize probability of not crashing). However, the probability of a car crashing is dependent on the speed. This can be described as a utility maximization problem described below, where we maximize subject to a constraint.

Maximize:

Utility(Speed,Probability of Not Crashing) = (Speed^a)*(Probability of not crashing ^(1-a))

Subject to:

Probability of Not Crashing = B*log(1/speed-1) + C

Where B and  C  are the constant and slope parameters of an inverse logistic function and a is tradeoff for utility function between the two goods (Probability of not crashing and Speed).
(Look in PDF for optimal solution)

Bayesian Updating Problem:

Described so far the problem is a fairly straightforward maximization problem. However, in this model, the parameters B and C are unknown to the individual. While there is a true function for the relationship between Speed and probability of not crashing  (that is a true B and C) the individual doesn’t know it.

Bayesian statistics seems like a natural to model the individual’s beliefs over these parameters. With this prior belief over parameters, he maximizes his utility subject to it and makes a decision in each time period. When new information becomes he updates his prior information and uses this to make a new decision. In this sense, the system is dynamic and a time series can be produced.

Unfortunately, I don’t believe there is a closed form solution to updating Bayesian logistic regression. Therefore, I used RSTAN to simulate from the posterior distributions (Code shown at end).

I set the model with parameters of with, a=.05, B=1, and C=100. The initial prior probabilities are parameters C~N(100,5) and B~N(1,.05) ,  where N(x,y) is a normal distribution with mean x and stdev y.  Below is one example of a simulated time series.

 



As you can see, the individual is learning the true parameters as he goes through time. When a crash occurs, he dramatically reduces his decision and if a crash doesn’t occur, he slightly increased his decision. This can be seen as a smoothed over “this time is different” where the actor adjusts his decision more optimistically until a negative event happen, in which case, he changes his decision more abruptly.

The optimum decision is shown in the blue horizontal line, and while the actor doesn’t achieve it in this example, he will eventually (… I think) as this is the equilibrium.

Conclusion:
The purpose of this post was to describe a mathematical interpretation of “This Time is Different”. I did this by solving a maximization problem, with unknown parameters, and then updating those parameters using the Bayesian Framework.

While the example I used was only relation to speed and car crashes, I don’t think its much of a stretch to use this model to describe economic events. An example of might be that a bank wants to have an optimal amount of amount of capital requirements vs event of bankruptcy for a bank. (The smaller capital requirements, the higher the expected profits, and higher probability of bankruptcy.)



Modeling "This Time is Different"

***WHOOPS Made some mistakes in analysis. Update HERE***
Check out PDF for Equations
And Code to run yourself

Introduction:
“This Time is different” by Reinhart and Rogoff is an empirical history of financial crises and panics.  It describes many economic events such as; inflation, bubbles, and defaults.

The theme that ties these events together is the idea that people have an unrealistic expectation of the future because they believe “This time is different”. With this new belief, they act in a way as to cause the next crisis.

While the book dealt with the empirical investigation of the matter, it didn’t describe the theme itself in intricate detail. This post will attempt to fill that void by giving a mathematical interpretation of the theme.

There are 3 aspects of the model I’d like to capture,
1.     prior beliefs act guide and tell us how to make decisions
2.     decisions have an effect on the future
3.     people update our beliefs when new information becomes available

Economic Maximization Problem:
I came up with a simple driving example that describes these properties.  Suppose we’d like to maximize the speed of a car and minimize the probability of a crash (or maximize probability of not crashing). However, the probability of a car crashing is dependent on the speed. This can be described as a utility maximization problem described below, where we maximize subject to a constraint.

Maximize:

Utility(Speed,Probability of Not Crashing) = (Speed^a)*(Probability of not crashing ^(1-a))

Subject to:

Probability of Not Crashing = B*log(1/speed-1) + C

Where B and  C  are the constant and slope parameters of an inverse logistic function and a is tradeoff for utility function between the two goods (Probability of not crashing and Speed).
(Look in PDF for optimal solution)

Bayesian Updating Problem:

Described so far the problem is a fairly straightforward maximization problem. However, in this model, the parameters B and C are unknown to the individual. While there is a true function for the relationship between Speed and probability of not crashing  (that is a true B and C) the individual doesn’t know it.

Bayesian statistics seems like a natural to model the individual’s beliefs over these parameters. With this prior belief over parameters, he maximizes his utility subject to it and makes a decision in each time period. When new information becomes he updates his prior information and uses this to make a new decision. In this sense, the system is dynamic and a time series can be produced.

Unfortunately, I don’t believe there is a closed form solution to updating Bayesian logistic regression. Therefore, I used RSTAN to simulate from the posterior distributions (Code shown at end).

I set the model with parameters of with, a=.05, B=1, and C=100. The initial prior probabilities are parameters C~N(100,5) and B~N(1,.05) ,  where N(x,y) is a normal distribution with mean x and stdev y.  Below is one example of a simulated time series.

 



As you can see, the individual is learning the true parameters as he goes through time. When a crash occurs, he dramatically reduces his decision and if a crash doesn’t occur, he slightly increased his decision. This can be seen as a smoothed over “this time is different” where the actor adjusts his decision more optimistically until a negative event happen, in which case, he changes his decision more abruptly.

The optimum decision is shown in the blue horizontal line, and while the actor doesn’t achieve it in this example, he will eventually (… I think) as this is the equilibrium.

Conclusion:
The purpose of this post was to describe a mathematical interpretation of “This Time is Different”. I did this by solving a maximization problem, with unknown parameters, and then updating those parameters using the Bayesian Framework.

While the example I used was only relation to speed and car crashes, I don’t think its much of a stretch to use this model to describe economic events. An example of might be that a bank wants to have an optimal amount of amount of capital requirements vs event of bankruptcy for a bank. (The smaller capital requirements, the higher the expected profits, and higher probability of bankruptcy.)



Saturday, January 25, 2014

Economist Year in Review: Part 3

Finally, I want to show a network graph of countries based on the correlations using LDA. I used qgraph() in R. Kind of interesting how Switzerland and Norway are outside the main European Cluster. 



Economist Year in Review: Part 2

Economist Year in Review: Part 2

I like how The Economist focuses on Geo Political issues and I want to capture that in my analysis. I therefore labelled each article by the countries/demonyns included in the text (If either “United States” or “American” was included in text then it would label that particular article “United States”. Of course, an article can be labelled many countries).
I then removed all instances of Country Names from the text. I did so because I wanted to use LDA again, and having country names in the text would create a dependancy I don't want.
So, for this part of analysis I have two matrices. One involves LDA matrix of 3440 articles by 100 topics that describes each article as a percentage of 100 topics (with the country names and demonyms removed from the text). The second matrix is 3440 by 193 which represents each article and which countries are mentioned in the text of the article.
The question I want to address is; what are the main groups of international affairs? For example; we expect Syria, Iran, and US to be in a cluster and China, Japan, and US to a cluster as well.
To do this, I used K-means clustering on the articles by countries matrix to group articles based on the countries included in the text. Then to understand what these groups discuss, I averaged the topics of each classified group.
Lets just see how many times each country is mentioned at least once in an article
load("/Users/sweiss/Google Drive/countrymatrix.rdata")
country.sums = colSums(country.mat)
barplot(sort(country.sums, decreasing = TRUE)[1:50], las = 3, cex.names = 0.7, 
    ylab = "Number of Times a country name or Demonym occured at least once in an article", 
    main = "Number of times a Country was Identified in an article (Economist 2013)")
plot of chunk unnamed-chunk-1
USA is number 1 with UK in a distance 2nd. Obviously there's some UK bias because this is an English newspaper.
Below are the clusters of articles based on the countries named and Topics most associated with those clusters. I chose 10 clusters using the elbow graph method (not shown).
library(topicmodels)
load("/users/sweiss/google drive/economistnocountryld100.Rda")
theta = posterior(ld.100)$topics
theta.average = colMeans(theta)
top.5.factors = names(sort(theta.average, decreasing = TRUE))
load("/Users/sweiss/Google Drive/countrymatrix.rdata")

kmeans.10 <- kmeans(x = country.mat, centers = 10)


cluster.10 = kmeans.10$cluster
for (i in 1:10) {
    theta.average.cluster.1 = colMeans(theta[which(cluster.10 == i), ])
    top.5.factors.cluster.1 = names(sort(theta.average.cluster.1, decreasing = TRUE))
    print(paste("Cluster", i))
    print("Top 10 Countries in Cluster")
    print(sort(colSums(country.mat[which(cluster.10 == i), ]), decreasing = TRUE)[1:10])
    print("Average Number of Countries in Article")
    print(mean(rowSums(country.mat[which(cluster.10 == i), ])))
    print("Number of Articles in Cluster")
    print(sum(rowSums(country.mat[which(cluster.10 == i), ])))
    print(terms(ld.100, 10)[, as.numeric(top.5.factors.cluster.1)[1:10]])
}
## [1] "Cluster 1"
## [1] "Top 10 Countries in Cluster"
## United Kingdom         France          Japan         Russia          Italy 
##            254             73             60             46             44 
##          Spain         Israel         Turkey    Netherlands         Canada 
##             34             33             33             32             31 
## [1] "Average Number of Countries in Article"
## [1] 1.224
## [1] "Number of Articles in Cluster"
## [1] 1567
##       Topic 33    Topic 19   Topic 97   Topic 79   Topic 66    
##  [1,] "rate"      "elect"    "cell"     "labour"   "minist"    
##  [2,] "price"     "parti"    "research" "cameron"  "govern"    
##  [3,] "economist" "vote"     "scienc"   "tori"     "prime"     
##  [4,] "interest"  "voter"    "work"     "parti"    "polit"     
##  [5,] "index"     "poll"     "human"    "britain"  "leader"    
##  [6,] "market"    "polit"    "brain"    "conserv"  "parliament"
##  [7,] "trade"     "campaign" "one"      "polit"    "parti"     
##  [8,] "commod"    "seat"     "cancer"   "miliband" "opposit"   
##  [9,] "job"       "win"      "might"    "mps"      "elect"     
## [10,] "balanc"    "candid"   "use"      "david"    "berlusconi"
##       Topic 40     Topic 16  Topic 26   Topic 58  Topic 53 
##  [1,] "presid"     "polic"   "protest"  "test"    "local"  
##  [2,] "polit"      "crime"   "govern"   "time"    "town"   
##  [3,] "elect"      "prison"  "street"   "ask"     "park"   
##  [4,] "power"      "crimin"  "erdogan"  "peopl"   "build"  
##  [5,] "leader"     "say"     "polic"    "experi"  "council"
##  [6,] "presidenti" "sentenc" "support"  "think"   "centr"  
##  [7,] "constitut"  "drug"    "call"     "word"    "place"  
##  [8,] "countri"    "murder"  "demonstr" "suggest" "peopl"  
##  [9,] "year"       "peopl"   "polit"    "person"  "plan"   
## [10,] "run"        "jail"    "opposit"  "relat"   "new"    
## [1] "Cluster 2"
## [1] "Top 10 Countries in Cluster"
##         France        Germany          Italy          Spain United Kingdom 
##            128            118            116            103             58 
##  United States         Greece    Netherlands        Ireland         Russia 
##             55             53             52             38             32 
## [1] "Average Number of Countries in Article"
## [1] 8.138
## [1] "Number of Articles in Cluster"
## [1] 1237
##       Topic 9    Topic 28   Topic 88  Topic 62   Topic 83  Topic 14 
##  [1,] "euro"     "european" "bank"    "bond"     "price"   "economi"
##  [2,] "zone"     "europ"    "financi" "debt"     "market"  "growth" 
##  [3,] "european" "union"    "loan"    "rate"     "cost"    "econom" 
##  [4,] "countri"  "commiss"  "lend"    "yield"    "rise"    "gdp"    
##  [5,] "bank"     "countri"  "capit"   "investor" "year"    "invest" 
##  [6,] "crisi"    "nation"   "deposit" "govern"   "demand"  "export" 
##  [7,] "bail"     "want"     "credit"  "interest" "low"     "account"
##  [8,] "imf"      "brussel"  "crisi"   "market"   "increas" "product"
##  [9,] "market"   "treati"   "borrow"  "borrow"   "like"    "year"   
## [10,] "economi"  "like"     "asset"   "default"  "high"    "busi"   
##       Topic 29   Topic 17   Topic 91   Topic 52   
##  [1,] "worker"   "firm"     "mrs"      "left"     
##  [2,] "job"      "market"   "merkel"   "holland"  
##  [3,] "work"     "industri" "left"     "right"    
##  [4,] "labour"   "product"  "parti"    "presid"   
##  [5,] "employ"   "big"      "coalit"   "polit"    
##  [6,] "wage"     "new"      "green"    "socialist"
##  [7,] "unemploy" "profit"   "centr"    "now"      
##  [8,] "skill"    "busi"     "govern"   "put"      
##  [9,] "pay"      "sale"     "democrat" "yet"      
## [10,] "factori"  "compani"  "social"   "fran"     
## [1] "Cluster 3"
## [1] "Top 10 Countries in Cluster"
##  United States United Kingdom         Canada         France         Russia 
##            686            115             49             46             37 
##          Japan    Afghanistan      Australia          Spain        Georgia 
##             34             27             24             23             21 
## [1] "Average Number of Countries in Article"
## [1] 2.257
## [1] "Number of Articles in Cluster"
## [1] 1548
##       Topic 87     Topic 72 Topic 19   Topic 90   Topic 63   Topic 47    
##  [1,] "republican" "court"  "elect"    "insur"    "fund"     "state"     
##  [2,] "obama"      "law"    "parti"    "health"   "investor" "feder"     
##  [3,] "democrat"   "case"   "vote"     "will"     "return"   "california"
##  [4,] "senat"      "legal"  "voter"    "plan"     "invest"   "governor"  
##  [5,] "polit"      "judg"   "poll"     "care"     "asset"    "year"      
##  [6,] "congress"   "rule"   "polit"    "feder"    "equiti"   "say"       
##  [7,] "hous"       "right"  "campaign" "obamacar" "share"    "one"       
##  [8,] "bill"       "lawyer" "seat"     "mani"     "manag"    "texa"      
##  [9,] "america"    "suprem" "win"      "exchang"  "profit"   "back"      
## [10,] "barack"     "justic" "candid"   "state"    "money"    "also"      
##       Topic 97   Topic 64    Topic 62   Topic 88 
##  [1,] "cell"     "compani"   "bond"     "bank"   
##  [2,] "research" "firm"      "debt"     "financi"
##  [3,] "scienc"   "busi"      "rate"     "loan"   
##  [4,] "work"     "deal"      "yield"    "lend"   
##  [5,] "human"    "share"     "investor" "capit"  
##  [6,] "brain"    "billion"   "govern"   "deposit"
##  [7,] "one"      "sharehold" "interest" "credit" 
##  [8,] "cancer"   "buy"       "market"   "crisi"  
##  [9,] "might"    "stake"     "borrow"   "borrow" 
## [10,] "use"      "privat"    "default"  "asset"  
## [1] "Cluster 4"
## [1] "Top 10 Countries in Cluster"
##        Germany  United States United Kingdom         France          China 
##            278            112             94             70             44 
##    Netherlands    Switzerland         Russia          Japan         Greece 
##             32             29             27             26             21 
## [1] "Average Number of Countries in Article"
## [1] 4.324
## [1] "Number of Articles in Cluster"
## [1] 1202
##       Topic 91   Topic 28   Topic 9    Topic 88  Topic 17   Topic 64   
##  [1,] "mrs"      "european" "euro"     "bank"    "firm"     "compani"  
##  [2,] "merkel"   "europ"    "zone"     "financi" "market"   "firm"     
##  [3,] "left"     "union"    "european" "loan"    "industri" "busi"     
##  [4,] "parti"    "commiss"  "countri"  "lend"    "product"  "deal"     
##  [5,] "coalit"   "countri"  "bank"     "capit"   "big"      "share"    
##  [6,] "green"    "nation"   "crisi"    "deposit" "new"      "billion"  
##  [7,] "centr"    "want"     "bail"     "credit"  "profit"   "sharehold"
##  [8,] "govern"   "brussel"  "imf"      "crisi"   "busi"     "buy"      
##  [9,] "democrat" "treati"   "market"   "borrow"  "sale"     "stake"    
## [10,] "social"   "like"     "economi"  "asset"   "compani"  "privat"   
##       Topic 76  Topic 66     Topic 29   Topic 15  
##  [1,] "billion" "minist"     "worker"   "investig"
##  [2,] "year"    "govern"     "job"      "claim"   
##  [3,] "will"    "prime"      "work"     "case"    
##  [4,] "cost"    "polit"      "labour"   "alleg"   
##  [5,] "also"    "leader"     "employ"   "charg"   
##  [6,] "worth"   "parliament" "wage"     "report"  
##  [7,] "total"   "parti"      "unemploy" "trial"   
##  [8,] "last"    "opposit"    "skill"    "said"    
##  [9,] "make"    "elect"      "pay"      "former"  
## [10,] "estim"   "berlusconi" "factori"  "accus"   
## [1] "Cluster 5"
## [1] "Top 10 Countries in Cluster"
##         Syria United States          Iraq          Iran        Israel 
##           121            88            80            73            66 
##        Turkey       Lebanon        Russia         Egypt  Saudi Arabia 
##            57            42            40            38            37 
## [1] "Average Number of Countries in Article"
## [1] 7.39
## [1] "Number of Articles in Cluster"
## [1] 1005
##       Topic 46  Topic 48     Topic 81      Topic 92   Topic 21   
##  [1,] "rebel"   "america"    "muslim"      "forc"     "attack"   
##  [2,] "assad"   "obama"      "ian"         "armi"     "kill"     
##  [3,] "regim"   "presid"     "islam"       "defenc"   "war"      
##  [4,] "war"     "nuclear"    "islamist"    "militari" "bomb"     
##  [5,] "govern"  "polici"     "brother"     "arm"      "group"    
##  [6,] "forc"    "relat"      "now"         "secur"    "terrorist"
##  [7,] "north"   "intern"     "morsi"       "general"  "dead"     
##  [8,] "arm"     "weapon"     "brotherhood" "war"      "violenc"  
##  [9,] "group"   "washington" "back"        "soldier"  "terror"   
## [10,] "western" "foreign"    "includ"      "troop"    "drone"    
##       Topic 89      Topic 26   Topic 66     Topic 37    Topic 7  
##  [1,] "white"       "protest"  "minist"     "deal"      "polit"  
##  [2,] "black"       "govern"   "govern"     "trade"     "putin"  
##  [3,] "palestinian" "street"   "prime"      "talk"      "anti"   
##  [4,] "king"        "erdogan"  "polit"      "negoti"    "now"    
##  [5,] "relat"       "polic"    "leader"     "agreement" "also"   
##  [6,] "arab"        "support"  "parliament" "agre"      "may"    
##  [7,] "say"         "call"     "parti"      "side"      "kremlin"
##  [8,] "west"        "demonstr" "opposit"    "free"      "soviet" 
##  [9,] "state"       "polit"    "elect"      "sign"      "western"
## [10,] "two"         "opposit"  "berlusconi" "two"       "power"  
## [1] "Cluster 6"
## [1] "Top 10 Countries in Cluster"
##          Niger        Nigeria  United States         France United Kingdom 
##             55             52             20             14             13 
##           Mali   South Africa          China          Ghana       Cameroon 
##             12             12             10              9              7 
## [1] "Average Number of Countries in Article"
## [1] 7.182
## [1] "Number of Articles in Cluster"
## [1] 395
##       Topic 60  Topic 46  Topic 21    Topic 22  Topic 40     Topic 5     
##  [1,] "africa"  "rebel"   "attack"    "money"   "presid"     "food"      
##  [2,] "ship"    "assad"   "kill"      "pay"     "polit"      "farm"      
##  [3,] "african" "regim"   "war"       "servic"  "elect"      "farmer"    
##  [4,] "port"    "war"     "bomb"      "save"    "power"      "product"   
##  [5,] "region"  "govern"  "group"     "charg"   "leader"     "say"       
##  [6,] "contain" "forc"    "terrorist" "cost"    "presidenti" "produc"    
##  [7,] "intern"  "north"   "dead"      "card"    "constitut"  "agricultur"
##  [8,] "world"   "arm"     "violenc"   "fee"     "countri"    "meat"      
##  [9,] "dubai"   "group"   "terror"    "payment" "year"       "rice"      
## [10,] "countri" "western" "drone"     "account" "run"        "year"      
##       Topic 92   Topic 88  Topic 49  Topic 81     
##  [1,] "forc"     "bank"    "immigr"  "muslim"     
##  [2,] "armi"     "financi" "border"  "ian"        
##  [3,] "defenc"   "loan"    "migrant" "islam"      
##  [4,] "militari" "lend"    "mani"    "islamist"   
##  [5,] "arm"      "capit"   "peopl"   "brother"    
##  [6,] "secur"    "deposit" "say"     "now"        
##  [7,] "general"  "credit"  "illeg"   "morsi"      
##  [8,] "war"      "crisi"   "year"    "brotherhood"
##  [9,] "soldier"  "borrow"  "countri" "back"       
## [10,] "troop"    "asset"   "issu"    "includ"     
## [1] "Cluster 7"
## [1] "Top 10 Countries in Cluster"
##          China  United States          Japan United Kingdom         Russia 
##            409            187             91             60             40 
##         France      Australia         Canada        Vietnam         Taiwan 
##             39             29             24             24             23 
## [1] "Average Number of Countries in Article"
## [1] 3.399
## [1] "Number of Articles in Cluster"
## [1] 1390
##       Topic 85    Topic 24 Topic 17   Topic 95    Topic 4      Topic 14 
##  [1,] "offici"    "south"  "firm"     "project"   "parti"      "economi"
##  [2,] "beij"      "north"  "market"   "mine"      "polit"      "growth" 
##  [3,] "said"      "korea"  "industri" "water"     "power"      "econom" 
##  [4,] "recent"    "asia"   "product"  "govern"    "leader"     "gdp"    
##  [5,] "report"    "island" "big"      "build"     "nation"     "invest" 
##  [6,] "one"       "region" "new"      "river"     "politician" "export" 
##  [7,] "govern"    "relat"  "profit"   "say"       "member"     "account"
##  [8,] "ministri"  "east"   "busi"     "construct" "congress"   "product"
##  [9,] "public"    "sea"    "sale"     "one"       "support"    "year"   
## [10,] "communist" "also"   "compani"  "plan"      "constitut"  "busi"   
##       Topic 86 Topic 88  Topic 48     Topic 74 
##  [1,] "open"   "bank"    "america"    "foreign"
##  [2,] "also"   "financi" "obama"      "govern" 
##  [3,] "will"   "loan"    "presid"     "countri"
##  [4,] "mani"   "lend"    "nuclear"    "local"  
##  [5,] "hong"   "capit"   "polici"     "make"   
##  [6,] "can"    "deposit" "relat"      "control"
##  [7,] "anoth"  "credit"  "intern"     "abroad" 
##  [8,] "kong"   "crisi"   "weapon"     "intern" 
##  [9,] "argu"   "borrow"  "washington" "last"   
## [10,] "one"    "asset"   "foreign"    "may"    
## [1] "Cluster 8"
## [1] "Top 10 Countries in Cluster"
##  United States         Brazil         Mexico          Spain          Chile 
##            124             80             78             23             21 
##          China      Venezuela      Argentina United Kingdom       Colombia 
##             18             16             15             15             14 
## [1] "Average Number of Countries in Article"
## [1] 5.233
## [1] "Number of Articles in Cluster"
## [1] 675
##       Topic 11  Topic 40     Topic 17   Topic 16  Topic 87     Topic 82
##  [1,] "countri" "presid"     "firm"     "polic"   "republican" "reform"
##  [2,] "world"   "polit"      "market"   "crime"   "obama"      "govern"
##  [3,] "global"  "elect"      "industri" "prison"  "democrat"   "will"  
##  [4,] "america" "power"      "product"  "crimin"  "senat"      "polici"
##  [5,] "develop" "leader"     "big"      "say"     "polit"      "chang" 
##  [6,] "rich"    "presidenti" "new"      "sentenc" "congress"   "system"
##  [7,] "emerg"   "constitut"  "profit"   "drug"    "hous"       "plan"  
##  [8,] "intern"  "countri"    "busi"     "murder"  "bill"       "polit" 
##  [9,] "latin"   "year"       "sale"     "peopl"   "america"    "public"
## [10,] "accord"  "run"        "compani"  "jail"    "barack"     "need"  
##       Topic 14  Topic 37    Topic 75  Topic 47    
##  [1,] "economi" "deal"      "number"  "state"     
##  [2,] "growth"  "trade"     "america" "feder"     
##  [3,] "econom"  "talk"      "sinc"    "california"
##  [4,] "gdp"     "negoti"    "time"    "governor"  
##  [5,] "invest"  "agreement" "less"    "year"      
##  [6,] "export"  "agre"      "rate"    "say"       
##  [7,] "account" "side"      "year"    "one"       
##  [8,] "product" "free"      "declin"  "texa"      
##  [9,] "year"    "sign"      "rise"    "back"      
## [10,] "busi"    "two"       "increas" "also"      
## [1] "Cluster 9"
## [1] "Top 10 Countries in Cluster"
##          India          China  United States United Kingdom          Japan 
##            292            123            114             78             51 
##       Pakistan         Russia         Brazil      Australia      Indonesia 
##             43             39             33             32             31 
## [1] "Average Number of Countries in Article"
## [1] 5.014
## [1] "Number of Articles in Cluster"
## [1] 1464
##       Topic 4      Topic 19   Topic 2   Topic 17   Topic 24 Topic 66    
##  [1,] "parti"      "elect"    "govern"  "firm"     "south"  "minist"    
##  [2,] "polit"      "parti"    "nation"  "market"   "north"  "govern"    
##  [3,] "power"      "vote"     "peopl"   "industri" "korea"  "prime"     
##  [4,] "leader"     "voter"    "ethnic"  "product"  "asia"   "polit"     
##  [5,] "nation"     "poll"     "local"   "big"      "island" "leader"    
##  [6,] "politician" "polit"    "villag"  "new"      "region" "parliament"
##  [7,] "member"     "campaign" "one"     "profit"   "relat"  "parti"     
##  [8,] "congress"   "seat"     "just"    "busi"     "east"   "opposit"   
##  [9,] "support"    "win"      "group"   "sale"     "sea"    "elect"     
## [10,] "constitut"  "candid"   "countri" "compani"  "also"   "berlusconi"
##       Topic 21    Topic 95    Topic 11  Topic 39 
##  [1,] "attack"    "project"   "countri" "one"    
##  [2,] "kill"      "mine"      "world"   "world"  
##  [3,] "war"       "water"     "global"  "argu"   
##  [4,] "bomb"      "govern"    "america" "blog"   
##  [5,] "group"     "build"     "develop" "histori"
##  [6,] "terrorist" "river"     "rich"    "long"   
##  [7,] "dead"      "say"       "emerg"   "great"  
##  [8,] "violenc"   "construct" "intern"  "view"   
##  [9,] "terror"    "one"       "latin"   "point"  
## [10,] "drone"     "plan"      "accord"  "much"   
## [1] "Cluster 10"
## [1] "Top 10 Countries in Cluster"
##     Tanzania        Kenya       Rwanda       Uganda South Africa 
##           17           16           15           15           14 
##        China        Niger        India      Nigeria       Angola 
##           13           12           11           11           10 
## [1] "Average Number of Countries in Article"
## [1] 11.87
## [1] "Number of Articles in Cluster"
## [1] 273
##       Topic 60  Topic 49  Topic 46  Topic 35 Topic 13 Topic 2   Topic 92  
##  [1,] "africa"  "immigr"  "rebel"   "peopl"  "store"  "govern"  "forc"    
##  [2,] "ship"    "border"  "assad"   "mani"   "retail" "nation"  "armi"    
##  [3,] "african" "migrant" "regim"   "fire"   "shop"   "peopl"   "defenc"  
##  [4,] "port"    "mani"    "war"     "now"    "sale"   "ethnic"  "militari"
##  [5,] "region"  "peopl"   "govern"  "caus"   "chain"  "local"   "arm"     
##  [6,] "contain" "say"     "forc"    "need"   "sell"   "villag"  "secur"   
##  [7,] "intern"  "illeg"   "north"   "damag"  "buy"    "one"     "general" 
##  [8,] "world"   "year"    "arm"     "also"   "custom" "just"    "war"     
##  [9,] "dubai"   "countri" "group"   "help"   "can"    "group"   "soldier" 
## [10,] "countri" "issu"    "western" "miss"   "good"   "countri" "troop"   
##       Topic 14  Topic 95    Topic 63  
##  [1,] "economi" "project"   "fund"    
##  [2,] "growth"  "mine"      "investor"
##  [3,] "econom"  "water"     "return"  
##  [4,] "gdp"     "govern"    "invest"  
##  [5,] "invest"  "build"     "asset"   
##  [6,] "export"  "river"     "equiti"  
##  [7,] "account" "say"       "share"   
##  [8,] "product" "construct" "manag"   
##  [9,] "year"    "one"       "profit"  
## [10,] "busi"    "plan"      "money"
The easy clusters to understand are numbers 1 (Super Powers), 4 (Euro Zone), 5 (Mideast Conflict), 7 (Asia), and 10 (South America). Cluster 3 seems to be about Regional Politics around India. Clusters 2 and 6 have low average number of countries per article so they are intranational articles.