Semantic Analysis of One Million #GamerGate Tweets
- Phillip R. Polefrone
This paper develops a methodology for describing the contents of a controversy on a microblogging platform (Twitter) by measuring correlations in broad semantic categories. Over one million tweets were gathered daily from November 2015 to June 2016 using Tweepy and the Twitter API, over 280,000 of which were not retweets and thus contained unique data. Using a Python implementation of Roget’s hierarchy of semantic categories, these tweets were collected in bins of one thousand and analyzed using a “bag of categories” model, or a categorized bag of words. The linear correlation of each category with the “WOMAN” category was measured and compared with a control group. The categories concomitant with “WOMAN” in the test corpus include some noise, but as a whole they present a meaningful description of the conversation that adheres to its known qualities. This result suggests that a more developed version of this methodology could be used to detect conversational trends on social media platforms more easily and with less human labor than other similar methods.
The full working paper can be found on my website.