Identifying human rights-related news coverage of genocide

Scholars interested in the coverage of human rights issues in the media often identify articles by searching for the term "human rights". However, not all articles about human rights issues necessarily use that term. As part of ongoing research in the STAIR lab, we decided to look into this problem a bit more.

We selected a number of different human rights violations to examine. In this first post, I discuss our findings for genocide. Arguably one of the most serious crimes against humanity, genocide has occurred several times in the past few decades, with the targeting of Yazidis by the Islamic State merely the latest example. (Other recent cases have been Darfur in the early 2000s and Rwanda in 1994.)

For this analysis we focused on three major American newspapers, the New York Times, USA Today, and the Washington Post. We downloaded all articles containing search term 'human rights' over the past 35 years (1981-2015), and did the same for the word stem 'genocid' (in order to capture genocide, genocidal, genocidaire, etc.). More than 125,000 articles met the first search criterion; almost 21,000 met the second. Of the 20,858 genocide articles, however, only 4,271 include the search term human rights; nearly four times as many articles do not.

Not all human rights news mentions human rights!

Still, it is conceivable that articles not mentioning human rights as such are in fact not about genocide as a human rights violation. For example, sometimes people are described as looking or acting like `a genocidal maniac` without having anything to do with an actual genocide.

My next step was to divide the genocide articles into two categories: those that do and those that do not mention human rights. We then used machine learning techniques on the latter sub-corpus in order to identify those articles that are about human rights without mentioning the term. Specifically, I constructed a corpus of human rights articles not mentioning genocide and a corpus of general (non-human rights) articles also not mentioning genocide, and asked the computer to learn the difference.

The computer learned to distinguish the two categories quite well, scoring 93% accuracy and a Krippendorff alpha of 0.86 on a held-out test set. This gives us confidence that the genocide articles it classified as about human rights actually are so. Accordingly, we can add to our corpus of human rights articles about genocide almost 60% of those that do not mention human rights. In fact, looking through the articles classified by the computer suggests that the computer is being quite conservative, and likely we are still omitting quite a few human rights-related articles. Nonetheless, this process effectively triples our corpus of human rights-related media coverage of genocide.

As an additional verification step, I combined the genocide articles mentioning human rights with those that do not but were classified as human rights-related, and asked the computer to try to tell them apart (after removing the term human rights from the first group). This proved to be a very difficult task, with a Krippendorff alpha on the held-out test set of just 0.32. The results support our confidence in the computer's decisions on the classification of human rights-related articles.

Mentioning human rights is not arbitrary!

Nevertheless, it is interesting to take a closer look at what might distinguish the two groups: What might make a journalist decide to mention human rights in an article about genocide? To examine this question more closely, we look specifically at the sentences in these articles that actually mention genocide, and identify words more likely to appear in sentences from either group of articles.

The table below provides the list of words most strongly associated with each category.


No mention of human rights Human rights mentioned
histories, oral, emergency, deception, techniques, sway, documenting, reveals, lie, fascism, fascist, remembrance, annihilation, centennial, recognizing, commemorate, surviving, propaganda, racist, demonstrators, millions, condemning, condemn, historians, catastrophe, notably, empire, accuses, racism, conspiracy, priest, complicity, proof, mention, acquitted, acted, starvation, oppression, shame, horrors, minorities, aggression, tragedy, participated, cultural, scale, editor, century, atrocity, killers rights, abuses, gross, human, violations, advocates, disappearances, jurisdiction, activists, unwilling, penal, watch, prosecutions, universal, investigate, coined, responding, hoc, conventions, charter, violation, prosecute, human-rights, accountability, adviser, establishing, tribunals, conclusion, internationally, advocate, massive, commission, amounted, warnings, offenses, charging, expert, torture, treaties, determine, ratified, statute, ad, individuals, junta, large-scale, adopted, cases, lawsuit, unfolding


These two word lists suggest that articles that do not mention human rights are more likely to be about studying or remembering past genocides: in the first list, words like (oral) histories, documenting, remembrance, centennial, commemorate, historians, editor, and century all point in this direction.

In contrast, the second list contains more words focusing on the crime itself (and, relatedly, on prosecuting those commit it): abuses, violation(s), advocates, jurisdiction, activists, penal, investigate, etc. In addition, although we cannot see this from the table, articles that do mention human rights contain, on average, almost twice as many occurrences of the words stem 'genocid' as do articles that do not.

It is important to note that the words listed above are specifically selected for being more prominent in one group or the other: if we look at the two groups of articles overall, they look quite similar, as the following two word clouds show:


No mention of human rights
Human rights mentioned


The word clouds reference largely the same genocides: Armenia, Rwanda, Bosnia, Darfur. The first word cloud does hint at 2 older genocides less obvious in the second one: 'nazi' (for the Holocaust), and 'khmer' (for Cambodia).

Finally, I looked at whether this same pattern emerges when we look for the mention of proper names more systematically (the word lists presented above deliberately excluded proper names). If we do so, the list for the articles that do not mention human rights now includes a number of names associated with the Holocaust ('holocaust', 'jews', 'elders' & 'zion', 'nuremburg', 'adolf', 'hitler'), Ukraine ('ukrainian' and 'stalin'), and Armenia ('turkey', 'armenian', 'ankara', 'ottoman'). In contrast, the other word list now features names associated with Rwanda ('alison' & 'des' & 'forges'), Guatemala ('efrain' & 'montt' & 'rios', 'guatemala', 'guatemalan', 'mayan'), and Chile ('pinochet', 'chilean').

In sum, media coverage of genocide is not always associated with the term 'human rights' (indeed, only about 1 in 3 articles mention that term). If we wish to understand human rights coverage in the media, therefore, searching on that term alone is not satisfactory. (On the other hand, simply selecting all uses of the word root 'genocid' is no less problematic, as it will include many uses that are not germane to human rights).

Moreover, and of equal importance, the inclusion of the term 'human rights' is not random. Instead, use of the term is associated with the occurrence and short-term aftermath of a genocide, while omission of the term is more common in articles reporting on longer-term effects and reactions. This pattern means that any attempt to derive systematic conclusions about media coverage from just the subset of articles mentioning human rights will likely give rise to unwarranted conclusions.




