Researchers have demonstrated an algorithm-based automated solution that is comparable to and sometimes better than humans at correctly identifying fake news stories.
The system that identifies telltale linguistic cues in fake news stories could provide news aggregator and social media sites like Google News with a new weapon in the fight against misinformation.
An automated solution could be an important tool for sites that are struggling to deal with an onslaught of fake news stories, often created to generate clicks or to manipulate public opinion, Rada Mihalcea, the University of Michigan professor behind the project, said in a statement.
The new system successfully found fakes up to 76 percent of the time, compared to a human success rate of 70 percent, according to the study to be presented on August 24 at the International Conference on Computational Linguistics in Santa Fe, New Mexico.
The researchers believe that their linguistic analysis approach could also be used to identify fake news articles that are too new to be debunked by cross-referencing their facts with other stories.
The linguistic analysis approach analyses quantifiable attributes like grammatical structure, word choice, punctuation, and complexity.
For the study, Mihalcea's team created its own data, crowdsourcing an online team that reverse-engineered verified genuine news stories into fakes.
This is how most actual fake news is created, Mihalcea said, by individuals who quickly write them in return for a monetary reward.
Study participants were paid to turn short, actual news stories into similar but fake news items, mimicking the journalistic style of the articles.
At the end of the process, the research team had a dataset of 500 real and fake news stories.
They then fed these labelled pairs of stories to an algorithm that performed a linguistic analysis, teaching itself distinguish between real and fake news.
Finally, the team turned the algorithms into a dataset of real and fake news pulled directly from the web, netting the 76 percent success rate.
The details of the new system and the dataset that the team used to build it could be used by news sites or other entities to build their own fake news detection systems, Mihalcea said.