The Data Science Behind r/antiwork's Upvotes | HackerNoon
Briefly

This section outlines the methodology used to analyze posts from the r/antiwork subreddit, covering data collection, user categorization, and thematic periodization. Posts from January 2019 to July 2022 were retrieved through the PushShift API, resulting in a large dataset. Filters were applied to exclude biased or irrelevant comments, leading to over 11 million usable comments and nearly 285,000 posts. Users were divided into 'light' and 'heavy' categories based on their engagement levels, with most being light users, thereby informing the analysis on participation in discussions about anti-work sentiments.
The dataset for our analysis was shaped by filtering out potentially biased comments, ensuring that the final set was representative and valid for our study.
We categorized users as 'light' or 'heavy' based on their engagement levels, with light users comprising a majority of posters while heavy users contributed significantly to overall engagement.
Read at Hackernoon
[
|
]