Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • Asia Cup 2025
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • PM Modi in Manipur
  • Charlie Kirk killer
  • Sushila Karki
  • IND vs PAK
  • India-US ties
  • New human organ
  • Downton Abbey: The Grand Finale Movie Review
fp-logo
AI companies have consumed the entire internet to train their models and are now running out of data
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
  • Home
  • Tech
  • AI companies have consumed the entire internet to train their models and are now running out of data

AI companies have consumed the entire internet to train their models and are now running out of data

FP Staff • April 3, 2024, 15:51:23 IST
Whatsapp Facebook Twitter

In a bid to make each LLM or large language model more powerful than the last, AI companies have used up almost all of the open internet and are running out of data. They may be forced to train their upcoming models on AI-generated data, which has its own problems

Advertisement
Subscribe Join Us
Add as a preferred source on Google
Prefer
Firstpost
On
Google
AI companies have consumed the entire internet to train their models and are now running out of data
AI companies may soon be forced to train their upcoming AI models on AI-generated data. Image Credit: Reuters

AI companies are facing a monumental challenge, one that would render all the billions of dollars that Big Tech is investing in them, pointless: they are running out of internet.

In the race to develop ever-larger and more advanced large language models, AI companies have practically consumed all of the open internet, and are now facing the imminent end of data, as reported by the Wall Street Journal.

This issue is pushing some firms to seek alternative sources for training data, such as publicly available video transcripts and the creation of AI-generated “synthetic data”. However, using AI-generated data to train AI models is a problem in and of itself — it leads to a higher chance of AI models hallucinating.

STORY CONTINUES BELOW THIS AD

Furthermore, discussions around synthetic data, have raised some serious concerns regarding the potential consequences of training AI models on AI-generated data. Experts believe that relying too much on AI-generated data leads to digital “inbreeding” which could eventually result in the AI model collapsing on itself.

More from Tech
How ChatGPT is becoming everyone’s BFF and why that’s dangerous How ChatGPT is becoming everyone’s BFF and why that’s dangerous America ready for self-driving cars, but it has a legal problem America ready for self-driving cars, but it has a legal problem

While entities like Dataology, founded by former Meta and Google DeepMind researcher Ari Morcos, are exploring methods to train expansive models with fewer data and resources, most major players are playing with some rather unconventional and contentious approaches to data training.

OpenAI, for example, is considering training its GPT-5 model using transcriptions from publicly available YouTube videos according to sources cited by the WSJ, even though the AI company is facing criticism for using such videos to train Sora, and may face lawsuits by video creators.

Nevertheless, companies like OpenAI and Anthropic, are planning to address this by developing superior synthetic data, although specifics regarding their methodologies remain still unclear.

Fears of AI companies have been running around for quite some time now. Despite predictions by some, like Epoch researcher Pablo Villalobos, estimating that AI could exhaust its usable training data in the coming years, there is a prevailing sentiment that significant breakthroughs could mitigate these concerns.

Impact Shorts

More Shorts
America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

However, an alternative solution to this dilemma exists: AI companies could opt to refrain from pursuing larger and more advanced models, considering the environmental toll associated with their development, including significant energy consumption and the reliance on rare-earth minerals for computing chips.

(With inputs from agencies)

End of Article
Latest News
Find us on YouTube
Subscribe
End of Article

Impact Shorts

America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

US self-driving cars may soon ditch windshield wipers as the NHTSA plans to update regulations by 2026. State-level rules vary, complicating nationwide deployment. Liability and insurance models are also evolving with the technology.

More Impact Shorts

Top Stories

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • IPL 2025
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Shorts Live TV