In partnership with Google News Lab, ProPublica, a non-profit investigative journalism newsroom, will form a database of media reports about hateful incidents and crimes which would be collected from Google News. The database will be formed via machine learning.
In the aftermath of rising hateful crimes, the most recent being the neo-Nazi attack in Charlottesville, Virginia in the US, the database will be called Documenting Hate. The coordinates in which the data is presented are title, the date on which the article was published, the publisher, location, keywords and a brief summary of the incident which took place. The data will be updated weekly. Pitch Interactive will help visualise the data, reports TechCrunch.
In the weekly dataset, alongside the titles of hate crime reports, there is a list of keywords which show the prominent keywords during the week. The one which appears bolder and bigger is said to be more prevalent in the dataset. From 7 August to 13 August, Donald Trump was the most used keyword, even Donald J Trump appeared to be in the top five keyword list.
— Google News Lab (@googlenewslab) August 18, 2017
According to ProPublica they also intend to develop the data set via experiences of people who have been victims of hateful crimes, since there may be chances of hate crimes going unreported. In the US, the FBI is required to report hate crimes, meanwhile, the local jurisdiction does not have any such compulsion, thus increasing the chances of such a crime going unreported. They also intend to collect data from civil rights groups.
As reported by TechCrunch, Google News Lab Data Editor, Simon Rogers said, “It is one of the first visualizations to use machine learning to generate its content using the Google Natural Language API, which analyses text and extracts information about people, places, and events.”
Google will open its data to contributors via GitHub where all the data would be collected.