Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • Asia Cup 2025
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • PM Modi in Manipur
  • Charlie Kirk killer
  • Sushila Karki
  • IND vs PAK
  • India-US ties
  • New human organ
  • Downton Abbey: The Grand Finale Movie Review
fp-logo
TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
  • Home
  • Tech
  • TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI

TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI

FP Staff • October 4, 2024, 14:05:51 IST
Whatsapp Facebook Twitter

ByteDance’s scraping frenzy suggests the company is working on a new large language model. Reports from earlier this year indicate that ByteDance was behind in the generative AI race and even used OpenAI’s models to help build its own model, violating their terms of service

Advertisement
Subscribe Join Us
Add as a preferred source on Google
Prefer
Firstpost
On
Google
TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI

ByteDance, the parent company of TikTok, is stepping up its efforts in the race to train generative AI models with the launch of a new web-scraping tool. Dubbed Bytespider, the bot was reportedly introduced in April and has already become one of the most aggressive web scrapers in operation.

Research from bot management company Kasada and bot monitoring firm Dark Visitors revealed that ByteDance’s Bytespider scrapes web data 25 times faster than GPTbot, OpenAI’s web scraper for its ChatGPT platform. It is also scraping at a rate 3,000 times faster than ClaudeBot, the scraper used by Anthropic for its Claude platform.

STORY CONTINUES BELOW THIS AD

A scraping frenzy
Since its debut, Bytespider’s activity has only increased, with noticeable spikes in scraping over the past six weeks, as per a report by Fortune.

More from Tech
How ChatGPT is becoming everyone’s BFF and why that’s dangerous How ChatGPT is becoming everyone’s BFF and why that’s dangerous America ready for self-driving cars, but it has a legal problem America ready for self-driving cars, but it has a legal problem

It appears ByteDance is trying to quickly gather as much data as possible to catch up with other tech giants like Google, Meta, and OpenAI, all of which use web scrapers to collect vast amounts of online data to train their large language and multimodal models (LLMs or LMMs).

However, ByteDance’s scraper, like those used by other AI companies, does not adhere to the robots.txt file, which is meant to signal scrapers to avoid taking data from specific websites.

Though robots.txt isn’t legally enforceable, the disregard for it has stirred controversy as web scraping is often seen as infringing on copyright, particularly when used to train AI models.

Impact Shorts

More Shorts
America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

As generative AI tools rely heavily on web data to function, scraping has become a contentious issue, with many individuals and organisations arguing that their work is being copied without compensation. The practice has been around for decades, primarily for search engines, but the rise of AI has introduced new legal and ethical concerns.

STORY CONTINUES BELOW THIS AD

ByteDance’s AI push
ByteDance’s aggressive scraping efforts come at a time when the company is under scrutiny, particularly in the US. President Joe Biden has signed legislation requiring ByteDance to either sell TikTok or shut it down, citing national security concerns.

Despite this, ByteDance seems determined to advance its AI capabilities.

ByteDance’s scraping frenzy suggests the company is working on a new large language model. Reports from earlier this year indicate that ByteDance was behind in the generative AI race and even relied on OpenAI to help build its own model, a move that violated OpenAI’s terms of service.

In early 2023, ByteDance launched Duabo, a chat-based LLM, but the model’s development was completed before the more recent data collection efforts.

One potential application for ByteDance’s new LLM is improving TikTok’s search functionality. TikTok recently updated its search feature to focus on keywords for ads, allowing advertisers to target trending words in real-time. With a more robust AI model trained on up-to-date web data, TikTok could further enhance its search capabilities, creating a more competitive environment for advertisers currently relying on Google.

STORY CONTINUES BELOW THIS AD

The rapid data collection and AI advancements suggest that ByteDance is eager to not only catch up but potentially reshape the landscape of search and AI, especially within the context of TikTok’s massive user base. If successful, these efforts could make TikTok’s search environment highly appealing to advertisers looking to reach larger audiences through precise, data-driven keywords and trends.

End of Article
Latest News
Find us on YouTube
Subscribe
End of Article

Impact Shorts

America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

US self-driving cars may soon ditch windshield wipers as the NHTSA plans to update regulations by 2026. State-level rules vary, complicating nationwide deployment. Liability and insurance models are also evolving with the technology.

More Impact Shorts

Top Stories

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • IPL 2025
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Shorts Live TV