American company Narrative Science has developed software that writes news articles based on data such as sports scores. Far from sounding clunky and artificial, the computer-generated results read quite naturally, testament to the ten years’ worth of research that has gone into the project.
Financial reports, real estate information, election and polling data and can also be used to generate readable stories in a matter of seconds. Initially the project focused on sports, where there is a clear tone, vocabulary and style. The stories it generates are now so good that sports site Big Ten Network uses it to provide summaries immediately a game concludes, such as for this Wisconsin vs. UNLV American football game:
“Wisconsin jumped out to an early lead and never looked back in a 51-17 win over UNLV on Thursday at Camp Randall Stadium.
“The Badgers scored 20 points in the first quarter on a Russell Wilson touchdown pass, a Montee Ball touchdown run and a James White touchdown run.
“Wisconsin’s offense dominated the Rebels’ defense. The Badgers racked up 499 total yards in the game including 258 yards passing and 251 yards on the ground.”
The software doesn’t simply slot data into pre-written sentences or follow a purely formulaic system, but uses historical data to provide context, much as a journalist would. Says the New York Times:
To generate story “angles,” explains Mr. Hammond of Narrative Science, the software learns concepts for articles like “individual effort,” “team effort,” “come from behind,” “back and forth,” “season high,” “player’s streak” and “rankings for team.” Then the software decides what element is most important for that game, and it becomes the lead of the article, he said. The data also determines vocabulary selection. A lopsided score may well be termed a “rout” rather than a “win.”
It has worked out well for the Big Ten Network, who have seen improved Google search rankings and a 40 percent increase in traffic during last season, compared to 2009.
Narrative Science is working with 20 companies at the moment. They are producing not only sports reports, but also “articles for medical journals, in which the software can compose an entire, unique article in about 60 seconds or so. What’s striking is that even language experts say you won’t know the difference between the software and an actual human, in terms of style, tone and usage.”
The company started as a joint research project between Northwestern University Schools of Engineering and Journalism. Its founders, Kris Hammond and Larry Birnbaum, are also co-directors of the Intelligent Information Laboratory at Northwestern University, which holds a stake in the company. The IIL isn’t just working on generating written news, but is also working on a project, News at Seven, to automate broadcast news:
News At Seven is a system that automatically generates a virtual news show. Totally autonomous, it collects, parses, edits and organizes news stories and then passes the formatted content to artificial anchors for presentation.
Should journalists be (even more) worried about their jobs? Narrative Science’s CEO, Stuart Frankel, says that they aren’t replicating the kind of work that human journalists do, but supplementing it. At just $10 per article, it’s possible that computer-generated copy could generate enough breadth of coverage of sports or the financial markets that the ad income generated could pay for the human journalists.
Topics such as sports, which attract passionate and loyal readers, have always subsidised the more expensive kinds of journalism such as general news and investigative reporting. If a news site can support its sports team by providing coverage of games that reporters just can’t get to, it could potentially increase the very ad revenue that it depends upon.
And whilst Narrative Science is aiming high, hoping to provide more complex journalism, it can only ever be as good as the data it is fed. Journalists are still needed for reporting, investigations, and gathering the information that underpins all journalism. A journalist still has to make a choice on which stories are covered and to what extent. And journalists are still needed to evaluate and make sense of multiple and contradictory sources.
Software may be useful for certain types of story which follow a clear and predictable format, but it certainly can’t look at a piece of breaking news and say, “Something just doesn’t feel right about this. I think I’ll investigate further…” For that, only a human will do.