Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • India vs South Africa
Trending Donald Trump Narendra Modi Elon Musk United States Joe Biden

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Bihar Election
  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • Hong Kong fire
  • Imran Khan
  • Gautam Gambhir
  • Ukraine peace plan
  • Kash Patel controversies
  • Celina Jaitley domestic violence case
fp-logo
TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Bihar Election
  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit

TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI

FP Staff • October 4, 2024, 14:05:51 IST
Whatsapp Facebook Twitter

ByteDance’s scraping frenzy suggests the company is working on a new large language model. Reports from earlier this year indicate that ByteDance was behind in the generative AI race and even used OpenAI’s models to help build its own model, violating their terms of service

Advertisement
Subscribe Join Us
+ Follow us On Google
Choose
Firstpost on Google
TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI

ByteDance, the parent company of TikTok, is stepping up its efforts in the race to train generative AI models with the launch of a new web-scraping tool. Dubbed Bytespider, the bot was reportedly introduced in April and has already become one of the most aggressive web scrapers in operation.

Research from bot management company Kasada and bot monitoring firm Dark Visitors revealed that ByteDance’s Bytespider scrapes web data 25 times faster than GPTbot, OpenAI’s web scraper for its ChatGPT platform. It is also scraping at a rate 3,000 times faster than ClaudeBot, the scraper used by Anthropic for its Claude platform.

STORY CONTINUES BELOW THIS AD

A scraping frenzy
Since its debut, Bytespider’s activity has only increased, with noticeable spikes in scraping over the past six weeks, as per a report by Fortune.

It appears ByteDance is trying to quickly gather as much data as possible to catch up with other tech giants like Google, Meta, and OpenAI, all of which use web scrapers to collect vast amounts of online data to train their large language and multimodal models (LLMs or LMMs).

However, ByteDance’s scraper, like those used by other AI companies, does not adhere to the robots.txt file, which is meant to signal scrapers to avoid taking data from specific websites.

Though robots.txt isn’t legally enforceable, the disregard for it has stirred controversy as web scraping is often seen as infringing on copyright, particularly when used to train AI models.

Quick Reads

View All
OpenAI launches ChatGPT shopping assistant to scour web for best match and deal

OpenAI launches ChatGPT shopping assistant to scour web for best match and deal

Meta accused of burying internal research showing Facebook, Instagram harm mental health

Meta accused of burying internal research showing Facebook, Instagram harm mental health

As generative AI tools rely heavily on web data to function, scraping has become a contentious issue, with many individuals and organisations arguing that their work is being copied without compensation. The practice has been around for decades, primarily for search engines, but the rise of AI has introduced new legal and ethical concerns.

STORY CONTINUES BELOW THIS AD

ByteDance’s AI push
ByteDance’s aggressive scraping efforts come at a time when the company is under scrutiny, particularly in the US. President Joe Biden has signed legislation requiring ByteDance to either sell TikTok or shut it down, citing national security concerns.

Despite this, ByteDance seems determined to advance its AI capabilities.

ByteDance’s scraping frenzy suggests the company is working on a new large language model. Reports from earlier this year indicate that ByteDance was behind in the generative AI race and even relied on OpenAI to help build its own model, a move that violated OpenAI’s terms of service.

In early 2023, ByteDance launched Duabo, a chat-based LLM, but the model’s development was completed before the more recent data collection efforts.

One potential application for ByteDance’s new LLM is improving TikTok’s search functionality. TikTok recently updated its search feature to focus on keywords for ads, allowing advertisers to target trending words in real-time. With a more robust AI model trained on up-to-date web data, TikTok could further enhance its search capabilities, creating a more competitive environment for advertisers currently relying on Google.

STORY CONTINUES BELOW THIS AD

The rapid data collection and AI advancements suggest that ByteDance is eager to not only catch up but potentially reshape the landscape of search and AI, especially within the context of TikTok’s massive user base. If successful, these efforts could make TikTok’s search environment highly appealing to advertisers looking to reach larger audiences through precise, data-driven keywords and trends.

  • Home
  • Tech
  • TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI
End of Article
Latest News
Find us on YouTube
Subscribe
  • Home
  • Tech
  • TikTok’s parent co ByteDance launched new web scraper, 'steals' data from the web 25X faster than OpenAI
End of Article

Quick Reads

OpenAI launches ChatGPT shopping assistant to scour web for best match and deal

OpenAI launches ChatGPT shopping assistant to scour web for best match and deal

OpenAI's ChatGPT shopping assistant now offers personalized product suggestions with enhanced research capabilities. It asks targeted questions and uses past interactions to refine recommendations. Available on all plans, this update aims to improve decision-making before Black Friday.

More Quick Reads

Top Stories

17 years of 26/11: 9 terrorists killed, one hanged; 2 acquitted, justice waits for 6

17 years of 26/11: 9 terrorists killed, one hanged; 2 acquitted, justice waits for 6

Georgia drops landmark case accusing Trump, allies of 2020 election meddling

Georgia drops landmark case accusing Trump, allies of 2020 election meddling

At least 36 killed, hundreds missing after fire engulfs high-rise buildings in Hong Kong

At least 36 killed, hundreds missing after fire engulfs high-rise buildings in Hong Kong

Deadline diplomacy: Trump’s Gaza-style playbook faces a tougher test in Ukraine truce push

Deadline diplomacy: Trump’s Gaza-style playbook faces a tougher test in Ukraine truce push

17 years of 26/11: 9 terrorists killed, one hanged; 2 acquitted, justice waits for 6

17 years of 26/11: 9 terrorists killed, one hanged; 2 acquitted, justice waits for 6

Georgia drops landmark case accusing Trump, allies of 2020 election meddling

Georgia drops landmark case accusing Trump, allies of 2020 election meddling

At least 36 killed, hundreds missing after fire engulfs high-rise buildings in Hong Kong

At least 36 killed, hundreds missing after fire engulfs high-rise buildings in Hong Kong

Deadline diplomacy: Trump’s Gaza-style playbook faces a tougher test in Ukraine truce push

Deadline diplomacy: Trump’s Gaza-style playbook faces a tougher test in Ukraine truce push

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Enjoying the news?

Get the latest stories delivered straight to your inbox.

Subscribe
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • Photostories
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Quick Reads Shorts Live TV