The advantages of lexicon-based sentiment analysis in an age of machine learning.
We demonstrate the strong performance of lexicon-based sentiment analysis using MultiLexScaled, an approach which averages valences across a number of widely-used general-purpose lexica. We validate it against benchmark datasets from a range of different domains, comparing performance against machine learning and LLM alternatives. In addition, we illustrate the value of identifying fine-grained sentiment levels by showing, in an analysis of pre- and post- 9/11 British press coverage of Muslims, that binarized valence metrics give rise to different (and erroneous) conclusions about the nature of the post-9/11 shock as well as about differences between broadsheet and tabloid coverage.
STAIR students over the years have been instrumental in helping test and develop the python notebooks we have put on Github to allow others to easily use the method. Check out Github to see more.