Saturday, November 2, 2013

Why are Palestinians Killed by Israeli Defense Forces?

I watched 5 Broken Cameras and was surprised that Israeli Defense Forces (IDF) shot live ammunition at Palestinians protesting the security wall. One Palestinian Protester in the documentary was shot dead. I wanted to know how often protestors throwing stones were killed by the IDF. 

B'tselem has statistics for each fatality caused by IDF in both West Bank and Gaza Strip since 9/2000, with a short description of the event that took place. Some examples of these descriptions are show below:

2                          Killed while on his way to buy candy at a store next to her school.
3                                   Killed when Border Police came to his house to arrest him.
4                                                                         Killed in his house.
5                Killed during the arrest of his brother, who Wanted by Israel. Was not armed.
6  Wanted by Israel. Killed during an exchange of gunfire with soldiers who came to arrest him.

I'd like to classify these descriptions to see reasons people are killed by IDF. Doing so can give a better picture of the current conflict and problems to overcome.

First some descriptive Statistics:
6711 Total Observations - All considered Palestinian Citizens
517 Female and 6194 Males
Mean age is 25.38 years, with 19-29 1st-3rd quartile. 
Above is an Age Pyramid of Palestinians killed. Men are predominately the victims and have a slightly right skewed distribution with age. Women have a much more uniform distribution. 

Overall the demographics most likely to be killed by IDF forces are male and in their mid twenties.

Below is a time series plot of all Palestinians Killed monthly by Gaza or West Bank Location. 

However, 539 deaths had no description.Below are those without time series:

Missing data appears to occur with high number of deaths. 

To classify the data I used Latent Dirichlet Allocation. This method assumes each text description or post is an amalgamation of independent topics. Each topic has a probability distribution over the set of words in all posts. For each description the algorithm takes the words as given and produces a probability distribution over the set of topics. In effect, each post will result in a combination of "Topics" and will be a sum of different these Topics. The benefit of LDA is that multiple topics can be applied to each description. 

Topics are displayed below in decreasing Importance:

Topic 7Topic 9Topic 4Topic 14Topic 5Topic 12Topic 19Topic 1Topic 15Topic 18
Topic 8Topic 16Topic 10Topic 11Topic 20Topic 6Topic 2Topic 17Topic 13Topic 3
checkpointyunistruckthrowneighborassassin "area"militarialonglater

Two Topics I thought were worth mentioning were topics 11 and 19.

Topic 11 includes terms "Wanted", "Arrest" "Israel", "Person", "Undercover". This topic discusses Palestinians killed by IDF soldiers who were wanted by Israel. 

Above is the average proportion of each post attributed to this topic. One can see that it increased in 2006 in West Bank, while remaining relatively constant in Gaza. Because Israel has much more control of WB than Gaza Strip, it makes sense that Israel tries and arrest more people in WB and results in more deaths as a result.

Topic 11 discusses those who died during demonstrations; the very topic that brought my interset tot he topic. Its interesting that there has been a steady increase in West Bank while Gaza has seen a slight decrease in these. The Separation Barrier could be the reason for this divergence in protesting.

R Code