Background

OpenAI Introduces 'o3' And The 'o4-mini:' The First AI Models That Can 'Think With Images'

OpenAI

The AI arms race, fueled by tech companies' obsession with Large Language Models, continues to accelerate with no signs of slowing down.

Since launching ChatGPT in late 2022, OpenAI has been accelerating the evolution of its models—expanding far beyond text to include images, voice, and even video. It's a bold push to lead the charge in the fast-moving world of generative AI, where the competition is fierce and ever-growing.

Rivals like Google, Anthropic, and Elon Musk’s xAI are all in the race, each vying for dominance in this high-stakes arena of machine intelligence.

In order to stay ahead, OpenAI introduces 'o3' as its most advanced model yet, and a smaller model called 'o4-mini'.

What makes these two models unique is that, they can "think with images," understanding whiteboards, sketches and diagrams, even when they're low quality.

Read: OpenAI Reveals The o3 Reasoning AI Model That Should Think Better Than The o1

The first, is the o3, which is the company's main new reasoning model.

Following the footsteps of the its predecessor, the o1, this new version can also solve complex problems and deliberating over its answers in multiple steps. But this o3 particularly shines in areas like programming, business strategy, consulting, and creative ideation.

"OpenAI o3 is our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more," said OpenAI in a post on its website.

It sets a new SOTA on benchmarks including Codeforces, SWE-bench (without building a custom model-specific scaffold), and MMMU. It’s ideal for complex queries requiring multi-faceted analysis and whose answers may not be immediately obvious.

According to evaluations by independent experts, o3 makes 20% fewer major errors than OpenAI’s earlier o1 model when tackling complex, real-world tasks.

Early testers praised its analytical depth, calling it a strong thought partner capable of both generating and rigorously assessing fresh hypotheses.

Its strengths are especially notable in domains like biology, mathematics, and engineering, where precision and insight truly matter.

But what makes the o3 particularly unique is that, it "performs especially strongly at visual tasks like analyzing images, charts, and graphics."

The models can also rotate, zoom and use other image-editing tools.

As for the o4-mini, this model is a compact model designed for fast and cost-efficient reasoning, delivering exceptional performance relative to its size.

It particularly excels in mathematics, programming, and visual reasoning tasks, making it an ideal choice for lightweight yet powerful AI applications.

Impressively, o4-mini currently holds the title of top-performing benchmarked model on AIME 2024 and 2025. With access to a Python interpreter, it reaches a 99.5% pass@1 rate and 100% consensus@8 on AIME 2025. While the use of tools like interpreters does lower the barrier of the exam, the results still underscore just how effectively o4-mini can leverage external tools to boost its reasoning abilities.

Notably, o3 also demonstrates similar gains under the same conditions, achieving 98.4% pass@1 and full consensus as well.

"In expert evaluations, o4-mini also outperforms its predecessor, o3‑mini, on non-STEM tasks as well as domains like data science. Thanks to its efficiency, o4-mini supports significantly higher usage limits than o3, making it a strong high-volume, high-throughput option for questions that benefit from reasoning," OpenAI said.

OpenAI still believes that large-scale reinforcement learning exhibits the same "more compute = better performance."

This is why the company pushed an additional order of magnitude in both training compute and inference-time reasoning, yet still see clear performance gains, validating that the models’ performance continues to improve the more they’re allowed to think. At equal latency and cost with OpenAI o1, o3 delivers higher performance in ChatGPT—and the company validated that if it let it think longer, its performance should keep climbing.

This is also why OpenAI believes that these two models should further pave the road towards agentic tool use.

"These models are trained to reason about how to solve problems, choosing when and how to use tools to produce detailed and thoughtful answers in the right output formats quickly—typically in under a minute," said OpenAI.

This flexible, strategic approach allows the models to tackle tasks that require access to up-to-date information beyond the model’s built-in knowledge, extended reasoning, synthesis, and output generation across modalities."

Published: 
16/04/2025