Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • Asia Cup 2025
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • Nepal protests
  • Nepal Protests Live
  • Vice-presidential elections
  • iPhone 17
  • IND vs PAK cricket
  • Israel-Hamas war
fp-logo
SGI Analyses Full Text Content Of English Edition Of Wikipedia
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
  • Home
  • Business
  • Biztech
  • SGI Analyses Full Text Content Of English Edition Of Wikipedia

SGI Analyses Full Text Content Of English Edition Of Wikipedia

FP Archives • February 2, 2017, 23:39:43 IST
Whatsapp Facebook Twitter

The results include visualisations of modern history captured in under a day utilising in-memory data-mining techniques.

Advertisement
Subscribe Join Us
Add as a preferred source on Google
Prefer
Firstpost
On
Google
SGI Analyses Full Text Content Of English Edition Of Wikipedia

SGI, the trusted leader in technical computing has partnered with Kalev H. Leetaru of the University of Illinois to create the first-ever historical mapping and exploration of the full text contents of the English-language edition of Wikipedia, in time and space. The results include visualisations of modern history captured in under a day utilising in-memory data-mining techniques. Loading the entire English language edition of Wikipedia into SGI UV 2000, Leetaru was able to show how Wikipedia’s view of the world unfolded over the past two centuries. Location, year and the positive or negative sentiment have been tied to those references.
While several previous projects have mapped Wikipedia entries with manually assigned location metadata by an editor, these previous attempts only accounted for a tiny fraction of Wikipedia’s location information. This project unlocked the contents of the articles themselves, identifying every location and date in all four million pages and the connections among them to create a massive network.

STORY CONTINUES BELOW THIS AD

“This analysis allows the world to take a step back from the individual articles and text to gain a forest view of the tremendous knowledge captured in Wikipedia, not just a page by page tree view. We can watch how one of the largest collections of human knowledge has evolved and see what we could never see before, such as global sentiment at a certain time and place, or where there might be blind spots in the knowledge coverage,” said Franz Aman, chief marketing officer and head of strategy, SGI. “We love to use Google Earth because we can zoom out and get the big picture view. With SGI UV 2, we can apply the same concept to Big Data to get the big picture on our Big Data.”

More from Biztech
Future Group - Reliance Retail Deal approved by CCI Future Group - Reliance Retail Deal approved by CCI RBI ban on cryptocurrencies takes effect; prohibition could force investors to tap the black market RBI ban on cryptocurrencies takes effect; prohibition could force investors to tap the black market

From this analysis, Wikipedia is seen to have four periods of growth in its historical coverage: 1001-1500 (Middle Ages), 1501-1729 (Early Modern Period), 1730-2003 (Age of Enlightenment), 2004-2011 (Wikipedia Era) and its continued growth appears to be focused on enhancing its coverage of historical events, rather than increased documenting of the present. The average tone of Wikipedia’s coverage of each year closely matches major global events, with the most negative period in the last 1,000 years being the American Civil War, followed by World War II. The analysis also shows that the “copyright gap” that blanks out most of the twentieth century in digitised print collections is not a problem with Wikipedia where there is steady exponential growth in its coverage from 1924 to today.

“The one-way nature of connections in Wikipedia, the lack of links, and the uneven distribution of Infoboxes, all point to the limitations of metadata-based data mining of collections like Wikipedia,” said Leetaru. “With SGI UV 2, the large shared memory available allowed me to ask questions of the entire dataset in near-real time. With a huge amount of cache-coherent shared memory at my fingertips, I could simply write a few lines of code and run it across the entire dataset, asking whatever questions came to mind. This isn’t possible with a scale-out computing approach. It’s very similar to using a word processor instead of using a typewriter – I can conduct my research in a completely different way, focusing on the outcomes, not the algorithms.”

Loaded into SGI UV 2000, the Big Brain computer, this massive dataset underwent full text geo-coding and complete date-coding, using algorithms that identified every mention of every location and every date across the text of every entry on Wikipedia. More than 80 million locations and 42 million dates between 1000 AD and 2012 were extracted, averaging 19 locations and 11 dates per article (every 44 words and every 75 words, respectively). The connections between every date and every location were captured into a massive network representing Wikipedia’s view of history. With this instrumentation, Leetaru was able to perform near-real time analysis over the entire dataset on the SGI UV 2 to create visual maps throughout space and time to see not only how history unfolded but also the overall tone of the world throughout the last thousand years, and interactively testing a wide array of theories and research questions, all in less than a day’s work.

STORY CONTINUES BELOW THIS AD

SGI UV 2 product family enables users to find answers to the world’s most difficult problems on a system as easy to administer as a workstation. Built with Intel Xeon processor E5 family, running standard Linux, and supporting a wide range of storage options, SGI UV 2 offers a complete, industry-standard solution for no-limit computing.

With as little as 16 cores and 32 gigabytes of memory, SGI UV 2 can start small and seamlessly expand. This next generation platform doubles the number of cores (up to 4096 cores) and quadruples the amount of coherent main memory (up to 64 terabytes) from the previous generation, available for in-memory computing in a single-image system. SGI UV 2 can scale to eight petabytes of shared memory and at a peak I/O rate of four terabytes per second (14 PB/hour) it could ingest the entire contents of the U.S. Library of Congress print collection in less than three seconds.

STORY CONTINUES BELOW THIS AD
Tags
Wikipedia Data Mining Big Data analysis SGI
End of Article
Written by FP Archives

see more

Latest News
Find us on YouTube
Subscribe
End of Article

Top Stories

Israel targets top Hamas leaders in Doha; Qatar, Iran condemn strike as violation of sovereignty

Israel targets top Hamas leaders in Doha; Qatar, Iran condemn strike as violation of sovereignty

Nepal: Oli to continue until new PM is sworn in, nation on edge as all branches of govt torched

Nepal: Oli to continue until new PM is sworn in, nation on edge as all branches of govt torched

Who is CP Radhakrishnan, India's next vice-president?

Who is CP Radhakrishnan, India's next vice-president?

Israel informed US ahead of strikes on Hamas leaders in Doha, says White House

Israel informed US ahead of strikes on Hamas leaders in Doha, says White House

Israel targets top Hamas leaders in Doha; Qatar, Iran condemn strike as violation of sovereignty

Israel targets top Hamas leaders in Doha; Qatar, Iran condemn strike as violation of sovereignty

Nepal: Oli to continue until new PM is sworn in, nation on edge as all branches of govt torched

Nepal: Oli to continue until new PM is sworn in, nation on edge as all branches of govt torched

Who is CP Radhakrishnan, India's next vice-president?

Who is CP Radhakrishnan, India's next vice-president?

Israel informed US ahead of strikes on Hamas leaders in Doha, says White House

Israel informed US ahead of strikes on Hamas leaders in Doha, says White House

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • IPL 2025
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Shorts Live TV