Background

OpenAI Introduces 'Flex,' A Cheaper AI Model 'Ideal For Non-Production Or Lower-Priority Tasks

OpenAI logo, gradient color header

In terms of AI, bigger is always better.

But not everyone needs a full-fledged AI with all of its abilities and features. Sometimes, people just want an AI to help them solve simple things and aid them finish small tasks. Not to mention that not everyone has the budget, or a reliable resource.

What began as a race for supremacy in Large Language Models—sparked by the hype surrounding OpenAI’s ChatGPT—has since evolved into a relentless push to develop increasingly powerful AI systems.

At the same time, companies are also introducing smaller, lightweight versions of their flagship models.

In OpenAI’s case, these are distinguished by the "mini" suffix.

To cater to users seeking even lesser performance and lesser resource efficiency, OpenAI also offers what it calls the ‘Flex’—a streamlined solution for those who need AI without the heavy demands.

In a page on its website, OpenAI said that:

"Flex processing provides significantly lower costs for Chat Completions or Responses requests in exchange for slower response times and occasional resource unavailability. It is ideal for non-production or lower-priority tasks such as model evaluations, data enrichment, or asynchronous workloads."

​OpenAI offers two distinct options to cater to varying needs.

First of, the mini models, such as GPT-4.1 mini and GPT-4o mini, are designed to provide a balance between performance and efficiency. They maintain a high level of intelligence while being more affordable and faster than their full-sized counterparts.

For instance, GPT-4.1 mini matches or exceeds GPT-4o in intelligence evaluations, with reduced latency and cost. These models are ideal for applications where computational resources are limited but quality cannot be compromised, such as customer service bots or mobile applications.​

On the other hand, Flex processing is an API option that offers a 50% discount on AI model usage in exchange for slower response times and occasional resource unavailability.

This mode is particularly suitable for non-urgent or background tasks, such as data enrichment, model evaluations, or asynchronous workloads.

Flex processing is currently available in beta for OpenAI’s o3 and o4-mini models.

While it provides significant cost savings, it may not be ideal for time-sensitive applications due to its slower processing speed and potential resource constraints.​

Read: https://www.eyerys.com/articles/news/openai-introduces-cost-effective-gpt-4o-mini-intelligence-too-cheap-meter

This Flex processing is essentially an API setting users can toggle to get cheaper access to OpenAI’s o3 and o4-mini models.

They can add a parameter to their API request, and have the costs drop by half. And because it’s "flexible," OpenAI prioritizes faster, pricier tasks, so your job might take longer or face occasional resource errors.

In summary, if users are developing applications that require consistent performance with limited resources, the mini models are a suitable choice.

However, if they're handling tasks that can tolerate slower responses, Flex processing offers a cost-effective solution.

Published: 
17/04/2025