This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Muhammed Yusuf Kocyigit, Boston University;
(2) Anietie Andy, University of Pennsylvania;
(3) Derry Wijaya, Boston University.
For toxicity measurements we use a filtering method to remove words that might lose their toxic meaning over time. We use the conservative subset from Hurtlex (Bassignana, Basile, and Patti 2018) and we include all categories off toxic words in our analysis. In Table 2 we see the words that are removed from the toxic words list. A few interesting observations are words relating to prostitution are generally removed in the early decades. This could be that while they could still have taboo, they had a big enough difference in how they relate to other words in that time that they were filtered out. Another interesting observation is that the word fascist was filtered out during the decades 1930 and 1940. This is before World War II where a lot of nationalist sentiment was becoming mainstream in the World which could also explain this change in the meaning.