computational text analysis

Systematically measuring dehumanisation of Palestinians in the media

Israel’s war in Gaza has drawn heightened attention to dehumanization and, relatedly, dehumanizing rhetoric.

The STAIR lab project for Spring 2024 was develop and apply a theoretically guided lexicon of dehumanizing words that permits a systematic analysis of dehumanizing rhetoric. By comparing the prevalence of dehumanizing words in the coverage of a particular group to the presence of that same language in a baseline category, we can identify the excess presence of dehumanizing language.

In this first paper applying the method, we analyze a corpus of British and Irish media coverage of the Israel-Palestine conflict since 2010. We find more dehumanization associated with Palestinians than with Israelis or people in general.

Word-level machine translation for bag-of-words text analysis

The quality of automated machine translation is rapidly approaching that of professional human translation. However, the best methods remain costly in terms of money, computational resources, and/or time, particularly when applied to large volumes of text. In contrast, word-level translation is both free and fast, simply mapping each word in a source language deterministically to a target language.

Initial work on generating good word-level machine translation was done with STAIR lab students during the 2019-2020 academic year. Over the course of the next few years, we gradually improved and expanded the method.

This paper demonstrates that high-quality word-level translation dictionaries can be generated cheaply and easily, and that they produce translations that can be used reliably as inputs into some of the most common automated text analysis methods.