Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • Asia Cup 2025
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • PM Modi in Manipur
  • Charlie Kirk killer
  • Sushila Karki
  • IND vs PAK
  • India-US ties
  • New human organ
  • Downton Abbey: The Grand Finale Movie Review
fp-logo
Rapid development of AI is breaking the ability to assess them, safety tests spouting false results
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
  • Home
  • Tech
  • Rapid development of AI is breaking the ability to assess them, safety tests spouting false results

Rapid development of AI is breaking the ability to assess them, safety tests spouting false results

FP Staff • April 11, 2024, 14:35:35 IST
Whatsapp Facebook Twitter

The speed and frequency with which AI studios are developing and launching new AI models is breaking benchmarking and assessing tools that keep a check and balance over them. As a result, these tests are spewing out problematic results, and clearing models that can’t be trusted

Advertisement
Subscribe Join Us
Add as a preferred source on Google
Prefer
Firstpost
On
Google
Rapid development of AI is breaking the ability to assess them, safety tests spouting false results
Representational Image: AI generated image

The rapid advancement of AI is creating some massive challenges in the traditional methods that are used to assess them. This presents a major dilemma for businesses and public bodies on how to effectively keep AI under check and navigate the evolving AI landscape.

Some of the most commonly used evaluation criteria for assessing AI performance like accuracy, and safety are falling short of the standard that they are usually subjected to as more sophisticated models enter the market as per a report by The Financial Times.

STORY CONTINUES BELOW THIS AD

According to experts in AI development, testing, and investment, these traditional tools are easily manipulated and too narrow in scope to accommodate the complexity of the latest AI systems.

More from Tech
How ChatGPT is becoming everyone’s BFF and why that’s dangerous How ChatGPT is becoming everyone’s BFF and why that’s dangerous America ready for self-driving cars, but it has a legal problem America ready for self-driving cars, but it has a legal problem

The intense competition in the AI space, fueled by investments from venture capitalists and tech giants like Microsoft, Google, and Amazon, has rendered many older benchmarks obsolete.

With new AI models and updates being launched every month, the evaluation standards that we have now are quickly becoming obsolete.

As generative AI becomes a top investment priority for a majority of tech businesses, it is becoming increasingly important to ensure that the AI products we have are actually trustworthy.

Governments are also grappling with how to deploy and manage the risks associated with the latest AI models, prompting initiatives like bilateral arrangements on AI safety between countries.

Impact Shorts

More Shorts
America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Concerns are also arising regarding the integrity of public tests, as the AI models’ training data may inadvertently include the exact questions used in evaluations, which in turn poses a challenge to the reliability of benchmarks.

In response to this, we now have startups that are solely focused on coming up with innovative approaches to evaluate emerging AI models. Some platforms offer bespoke tests set by individual users, providing a more direct reflection of user preferences. However, while such approaches may benefit individual users, they may have limited utility for companies with specific AI model requirements.

STORY CONTINUES BELOW THIS AD

Ultimately, businesses are advised to conduct internal testing and human evaluation to supplement traditional benchmarks, recognizing that the selection of AI models is as much an art as it is a science. As AI continues to evolve, adapting evaluation methods to ensure accuracy and reliability remains paramount for effectively harnessing the potential of this transformative technology.

(With inputs from agencies)

Tags
artificial intelligence (AI)
End of Article
Latest News
Find us on YouTube
Subscribe
End of Article

Impact Shorts

America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

US self-driving cars may soon ditch windshield wipers as the NHTSA plans to update regulations by 2026. State-level rules vary, complicating nationwide deployment. Liability and insurance models are also evolving with the technology.

More Impact Shorts

Top Stories

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • IPL 2025
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Shorts Live TV