The AI arms race, fueled by tech companies' obsession with Large Language Models, continues to accelerate with no signs of slowing down.
Since launching ChatGPT in late 2022, OpenAI has been accelerating the evolution of its models—expanding far beyond text to include images, voice, and even video. It's a bold push to lead the charge in the fast-moving world of generative AI, where the competition is fierce and ever-growing.
Rivals like Google, Anthropic, and Elon Musk’s xAI are all in the race, each vying for dominance in this high-stakes arena of machine intelligence.
In order to stay ahead, OpenAI introduces 'o3' as its most advanced model yet, and a smaller model called 'o4-mini'.
What makes these two models unique is that, they can "think with images," understanding whiteboards, sketches and diagrams, even when they're low quality.
Read: OpenAI Reveals The o3 Reasoning AI Model That Should Think Better Than The o1
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date.
For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE— OpenAI (@OpenAI) April 16, 2025
The first, is the o3, which is the company's main new reasoning model.
Following the footsteps of the its predecessor, the o1, this new version can also solve complex problems and deliberating over its answers in multiple steps. But this o3 particularly shines in areas like programming, business strategy, consulting, and creative ideation.
"OpenAI o3 is our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more," said OpenAI in a post on its website.
It sets a new SOTA on benchmarks including Codeforces, SWE-bench (without building a custom model-specific scaffold), and MMMU. It’s ideal for complex queries requiring multi-faceted analysis and whose answers may not be immediately obvious.
According to evaluations by independent experts, o3 makes 20% fewer major errors than OpenAI’s earlier o1 model when tackling complex, real-world tasks.
Early testers praised its analytical depth, calling it a strong thought partner capable of both generating and rigorously assessing fresh hypotheses.
Its strengths are especially notable in domains like biology, mathematics, and engineering, where precision and insight truly matter.
But what makes the o3 particularly unique is that, it "performs especially strongly at visual tasks like analyzing images, charts, and graphics."
The models can also rotate, zoom and use other image-editing tools.
OpenAI o3 is a powerful model across multiple domains, setting a new standard for coding, math, science, and visual reasoning tasks.
o4-mini is a remarkably smart model for its speed and cost-efficiency. This allows it to support significantly higher usage limits than o3, making…— OpenAI (@OpenAI) April 16, 2025
As for the o4-mini, this model is a compact model designed for fast and cost-efficient reasoning, delivering exceptional performance relative to its size.
It particularly excels in mathematics, programming, and visual reasoning tasks, making it an ideal choice for lightweight yet powerful AI applications.
Impressively, o4-mini currently holds the title of top-performing benchmarked model on AIME 2024 and 2025. With access to a Python interpreter, it reaches a 99.5% pass@1 rate and 100% consensus@8 on AIME 2025. While the use of tools like interpreters does lower the barrier of the exam, the results still underscore just how effectively o4-mini can leverage external tools to boost its reasoning abilities.
Notably, o3 also demonstrates similar gains under the same conditions, achieving 98.4% pass@1 and full consensus as well.
OpenAI o3 and o4-mini are our first models to integrate uploaded images directly into their chain of thought.
That means they don’t just see an image—they think with it. https://t.co/hSJkzeuNQR— OpenAI (@OpenAI) April 16, 2025
"In expert evaluations, o4-mini also outperforms its predecessor, o3‑mini, on non-STEM tasks as well as domains like data science. Thanks to its efficiency, o4-mini supports significantly higher usage limits than o3, making it a strong high-volume, high-throughput option for questions that benefit from reasoning," OpenAI said.
OpenAI still believes that large-scale reinforcement learning exhibits the same "more compute = better performance."
This is why the company pushed an additional order of magnitude in both training compute and inference-time reasoning, yet still see clear performance gains, validating that the models’ performance continues to improve the more they’re allowed to think. At equal latency and cost with OpenAI o1, o3 delivers higher performance in ChatGPT—and the company validated that if it let it think longer, its performance should keep climbing.
Both OpenAI o3 and o4-mini are also available to developers today via the Chat Completions API and Responses API.
The Responses API supports reasoning summaries, the ability to preserve reasoning tokens around function calls for better performance, and will soon support built-in…— OpenAI (@OpenAI) April 16, 2025
This is also why OpenAI believes that these two models should further pave the road towards agentic tool use.
"These models are trained to reason about how to solve problems, choosing when and how to use tools to produce detailed and thoughtful answers in the right output formats quickly—typically in under a minute," said OpenAI.
This flexible, strategic approach allows the models to tackle tasks that require access to up-to-date information beyond the model’s built-in knowledge, extended reasoning, synthesis, and output generation across modalities."
OpenAI o3 and o4-minihttps://t.co/giS4K1yNh9
— OpenAI (@OpenAI) April 16, 2025