Home / Event / ACH 2021 Talk: Detecting Latent Textual Bias with Topic Modeling and Sentiment Analysis

ACH 2021 Talk: Detecting Latent Textual Bias with Topic Modeling and Sentiment Analysis

ACH 2021 Talk: Detecting Latent Textual Bias with Topic Modeling and Sentiment Analysis
Jul
21

 

 

Bias detection is an emerging area of research for digital humanists, computational linguists, and information studies scholars, who point to biases inherent in our algorithms, software, tools, and platforms, but we are only just beginning to examine how computational methods could be used to interrogate our primary textual sources. This project presents a method for bias detection that can be used at a study’s outset with little initial knowledge of the corpus, requires little pre-processing, and is both beginner-friendly and language-agnostic. I employed topic modeling at different scales to create a hierarchical topic model using the latent Dirichlet allocation algorithm to study a set of 138 documents from three French-language chronicles of Ottoman Algeria. After developing models of 7-, 9-, and 11-topics based on their coherence scores, I generated additional smaller and larger models, eventually selecting models of 4-, 7-, 11-, and 20-topics as representative to create a hierarchy of topics that succinctly summarized the general themes and granular subjects in the corpus. Pairing topic modeling with sentiment analysis and targeted close reading of documents most closely related to topics of interest (based on document-topic weights) uncovered the stories of lesser known actors, as well as the biases inherent in the writing of their histories. The anti-Arab and/or anti-Turkish sentiments one might expect to observe were absent, but a latent anti-Semitic sentiment appeared in the 11- and 20-topic models, indicated by sentiment analysis scores of topics related to Jewish people and verified through a close reading of related passages.

Top
Social Media Auto Publish Powered By : XYZScripts.com