Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • Asia Cup 2025
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • PM Modi in Manipur
  • Charlie Kirk killer
  • Sushila Karki
  • IND vs PAK
  • India-US ties
  • New human organ
  • Downton Abbey: The Grand Finale Movie Review
fp-logo
Solving AI’s biggest problem: Microsoft, Google, Meta are using fake data to train their AI models
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
  • Home
  • Tech
  • Solving AI’s biggest problem: Microsoft, Google, Meta are using fake data to train their AI models

Solving AI’s biggest problem: Microsoft, Google, Meta are using fake data to train their AI models

FP Staff • May 3, 2024, 15:15:22 IST
Whatsapp Facebook Twitter

Major AI companies are running out of high-quality organic data. As a result they relying more and more on synthetic or fake data to train their AI models. While it may seem to be a pretty cheeky workaround for a serious problem,

Advertisement
Subscribe Join Us
Add as a preferred source on Google
Prefer
Firstpost
On
Google
Solving AI’s biggest problem: Microsoft, Google, Meta are using fake data to train their AI models
Google, Meta, and Microsoft have practically scrubbed the entire internet in search of data to train its AI models. Image Credit: Reuters, Reuters, AFP

AI companies for quite some time now have been are grappling with the challenge of obtaining high-quality data to train their systems, leading them to explore alternative methods such as synthetic data.

Traditionally, AI systems relied on vast amounts of data extracted from various sources like articles, books, and online comments to understand user queries and generate responses. However, the availability of such high-quality data on the internet is limited, which has prompted AI firms to seek alternative solutions.

STORY CONTINUES BELOW THIS AD

Synthetic data, essentially artificial data generated by AI systems, is emerging as a promising approach to address this issue. By leveraging their own AI models, tech companies are producing synthetic data to train future iterations of their systems. This method, dubbed an “infinite data generation engine” by Anthropic CEO Dario Amodei, aims to mitigate legal, ethical, and privacy concerns associated with traditional data acquisition methods.

More from Tech
How ChatGPT is becoming everyone’s BFF and why that’s dangerous How ChatGPT is becoming everyone’s BFF and why that’s dangerous America ready for self-driving cars, but it has a legal problem America ready for self-driving cars, but it has a legal problem

Although synthetic data in computing is not a novel concept, the rise of generative AI has facilitated the creation of higher-quality synthetic data at scale. Major AI companies like Meta, Google, and Microsoft have started using synthetic data to develop advanced models, including chatbots and language processors.

For instance, Anthropic used synthetic data to power its chatbot, Claude, while Google DeepMind employed this method to train a model capable of solving complex geometry problems. Meanwhile, Microsoft has made its small language models, developed using synthetic data, publicly available.

The process of generating synthetic data involves setting specific parameters and prompts for AI models to create content. For example, researchers at Microsoft tasked an AI model with generating children’s stories using a predefined list of words. This approach allows for more precise control over the data used to train AI systems.

Impact Shorts

More Shorts
America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

However, some AI experts have raised concerns about the risks associated with synthetic data. Researchers at prominent universities observed instances of “model collapse,” where AI models trained on synthetic data exhibited irreversible defects and produced nonsensical outputs. Additionally, there are concerns that synthetic data could exacerbate biases and toxicity in datasets.

STORY CONTINUES BELOW THIS AD

Despite these challenges, proponents argue that synthetic data, when properly implemented, can yield accurate and reliable models. Nonetheless, there is no consensus on the best practices for generating synthetic data, highlighting the need for further research and development in this area.

Furthermore, there is a philosophical debate surrounding the reliance on synthetic data, with questions arising about the nature of AI intelligence and its potential divergence from human understanding. Stanford University professor Percy Liang emphasized the importance of incorporating real human intelligence into the data generation process, highlighting the complexity of creating synthetic data at scale.

While synthetic data can be a promising solution to the data quality dilemma faced by AI companies, its implementation requires careful consideration of ethical, technical, and philosophical implications. As the field continues to evolve, collaboration between AI researchers and domain experts will be crucial in harnessing the full potential of synthetic data for AI development.

(With inputs from agencies)

STORY CONTINUES BELOW THIS AD
End of Article
Latest News
Find us on YouTube
Subscribe
End of Article

Impact Shorts

America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

US self-driving cars may soon ditch windshield wipers as the NHTSA plans to update regulations by 2026. State-level rules vary, complicating nationwide deployment. Liability and insurance models are also evolving with the technology.

More Impact Shorts

Top Stories

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • IPL 2025
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Shorts Live TV