
"Wire stories published by multiple outlets were treated as individual articles instead of collapsed, prioritizing news dissemination and reach over unique reporting. Generic news round-ups and recaps (e.g., "Weekend Report", "Top Stories", "News Roundup") were filtered from the event data. We then used the RoBERTa-base model to assign embeddings to each article headline, and employed these embeddings to cluster the output using HDBSCAN."
"A streamgraph shows article counts by topic, between 2020 and the present and clicking through shows a set of packed circles and tables that link to each article. On the classification of articles: Wire stories published by multiple outlets were treated as individual articles instead of collapsed, prioritizing news dissemination and reach over unique reporting. Generic news round-ups and recaps (e.g., "Weekend Report", "Top Stories", "News Roundup") were filtered from the event data."
The Trans News Initiative monitors news coverage of transgender communities from 2020 to the present. A streamgraph displays article counts by topic over time. Interactive elements reveal packed-circle visualizations and tables that link directly to each article. Classification rules treated wire stories syndicated across outlets as separate articles to reflect dissemination and reach, while generic news round-ups and recaps were filtered from event data. Headlines were encoded with the RoBERTa-base model to generate embeddings. HDBSCAN clustered those embeddings. An LLM labeled each cluster by producing an umbrella phrase derived from cluster headlines. The resulting system surfaces recurring themes and topical trends.
Read at FlowingData
Unable to calculate read time
Collection
[
|
...
]