Behind the Guardian's analysis of 100 years of MPs' language on immigration
Briefly

Behind the Guardian's analysis of 100 years of MPs' language on immigration
"The Guardian's Data Science and Data Projects teams, in collaboration with University College London, developed an in-house machine learning model to measure linguistic sentiment in debates in the Commons over the course of a century. Unlike off-the-shelf sentiment models, the Guardian's version distinguishes sentiment directed specifically at immigration from general emotionally charged language about any topic."
"The researchers first used a list of trigger terms manually designed and verified by experts on immigration history to identify speeches most likely to be about immigration. This process narrowed the data down to a manageable sample. To ensure the results were not biased by the choice of keywords, the team stress-tested their findings, running the analysis many times with different combinations of words."
"A team of 12 people manually labelled more than 1,250 fragments of parliamentary speeches and contributions over a century that were up to five sentences each. Where the fragment was about immigration it was identified as such, and then classified as either positive, negative or neutral."
The Guardian's Data Science and Data Projects teams, collaborating with University College London, developed a specialized machine learning model to measure sentiment specifically about immigration in House of Commons debates spanning a century. Unlike generic sentiment analysis tools, this model distinguishes immigration-related sentiment from general emotional language. The methodology involved identifying immigration-related speeches using expert-verified trigger terms, then manually labeling over 1,250 parliamentary fragments as positive, negative, or neutral regarding immigration. The team stress-tested findings across different keyword combinations to eliminate bias. Large Language Models were evaluated and used to expand the training dataset to over 22,600 annotated fragments, with statistical testing confirming robust accuracy levels.
Read at www.theguardian.com
Unable to calculate read time
[
|
]