Firstpost
  • Home
  • Video Shows
    Vantage Firstpost America Firstpost Africa First Sports
  • World
    US News
  • Explainers
  • News
    India Opinion Cricket Tech Entertainment Sports Health Photostories
  • Asia Cup 2025
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
Trending:
  • PM Modi in Manipur
  • Charlie Kirk killer
  • Sushila Karki
  • IND vs PAK
  • India-US ties
  • New human organ
  • Downton Abbey: The Grand Finale Movie Review
fp-logo
Apple has finally launched MM1, its multimodal AI model for text and image generation
Whatsapp Facebook Twitter
Whatsapp Facebook Twitter
Apple Incorporated Modi ji Justin Trudeau Trending

Sections

  • Home
  • Live TV
  • Videos
  • Shows
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Health
  • Tech/Auto
  • Entertainment
  • Web Stories
  • Business
  • Impact Shorts

Shows

  • Vantage
  • Firstpost America
  • Firstpost Africa
  • First Sports
  • Fast and Factual
  • Between The Lines
  • Flashback
  • Live TV

Events

  • Raisina Dialogue
  • Independence Day
  • Champions Trophy
  • Delhi Elections 2025
  • Budget 2025
  • US Elections 2024
  • Firstpost Defence Summit
  • Home
  • Tech
  • Apple has finally launched MM1, its multimodal AI model for text and image generation

Apple has finally launched MM1, its multimodal AI model for text and image generation

FP Staff • March 20, 2024, 13:38:37 IST
Whatsapp Facebook Twitter

Apple has finally launched MM1, its multimodal AI model for text and image generation which. The AI model can also perform in-context predictions thanks to its large-scale multimodal pre-training

Advertisement
Subscribe Join Us
Add as a preferred source on Google
Prefer
Firstpost
On
Google
Apple has finally launched MM1, its multimodal AI model for text and image generation
Apple was long due to launch its AI model. Image Credit: Reuters

After months of rumours and speculations around their upcoming AI projects and multimodal AI models, Apple researchers have developed a family of large multimodal language models called MM1, which can process and generate both text and visual data, according to a research paper presented last week.

The study at Apple’s research labs aimed to build performant multimodal large language models (MLLMs) through careful ablation of various architectural components, data sources, and training procedures.

STORY CONTINUES BELOW THIS AD

The researchers found that image resolution and the capacity of the visual encoder had the highest impact on model performance, while the specific method of combining visual and text data mattered less.

More from Tech
How ChatGPT is becoming everyone’s BFF and why that’s dangerous How ChatGPT is becoming everyone’s BFF and why that’s dangerous America ready for self-driving cars, but it has a legal problem America ready for self-driving cars, but it has a legal problem

They also discovered that a careful mix of different data types was crucial, with interleaved image-text documents helping with few-shot learning, traditional captioned images boosting zero-shot performance, and including text-only data maintaining strong language understanding capabilities.

MM1 can perform in-context predictions thanks to its large-scale multimodal pre-training. This allows MM1 to count objects and follow custom formatting, refer to parts of the images and perform OCR, demonstrate common sense and word knowledge about everyday objects, and perform basic math functions.

Based on these insights, the team developed the MM1 model family, ranging from three billion to 30 billion parameters, including dense and mixture-of-experts variants. After scaling up training, MM1 achieved state-of-the-art results on various multimodal benchmarks during pre-training.

Impact Shorts

More Shorts
America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Alibaba, Baidu begin using own AI chips as China shifts away from US tech amid Nvidia row

Following further instruction tuning on a curated 1 million example dataset, the final MM1 models demonstrated competitive performance across 12 multimodal tasks, such as visual question answering and captioning. Notably, MM1 could perform multi-image reasoning and few-shot learning, critical capabilities enabled by the team’s careful multimodal pre-training approach.

This paper builds upon previous research into areas like CLIP for learning visual representations from natural language supervision, and autoregressive models like GPT for text generation. However, it is one of the first detailed studies focused specifically on large-scale multimodal pre-training.

STORY CONTINUES BELOW THIS AD

The researchers hope their insights will accelerate progress, as Apple is reportedly in talks to integrate Google’s Gemini generative AI models into upcoming iPhone software.

Tags
Apple Apple AI artificial intelligence (AI)
End of Article
Latest News
Find us on YouTube
Subscribe
End of Article

Impact Shorts

America ready for self-driving cars, but it has a legal problem

America ready for self-driving cars, but it has a legal problem

US self-driving cars may soon ditch windshield wipers as the NHTSA plans to update regulations by 2026. State-level rules vary, complicating nationwide deployment. Liability and insurance models are also evolving with the technology.

More Impact Shorts

Top Stories

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

Russian drones over Poland: Trump’s tepid reaction a wake-up call for Nato?

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

As Russia pushes east, Ukraine faces mounting pressure to defend its heartland

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Why Mossad was not on board with Israel’s strike on Hamas in Qatar

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Turkey: Erdogan's police arrest opposition mayor Hasan Mutlu, dozens officials in corruption probe

Top Shows

Vantage Firstpost America Firstpost Africa First Sports
Latest News About Firstpost
Most Searched Categories
  • Web Stories
  • World
  • India
  • Explainers
  • Opinion
  • Sports
  • Cricket
  • Tech/Auto
  • Entertainment
  • IPL 2025
NETWORK18 SITES
  • News18
  • Money Control
  • CNBC TV18
  • Forbes India
  • Advertise with us
  • Sitemap
Firstpost Logo

is on YouTube

Subscribe Now

Copyright @ 2024. Firstpost - All Rights Reserved

About Us Contact Us Privacy Policy Cookie Policy Terms Of Use
Home Video Shorts Live TV