Background

'QwQ-32B' From Alibaba Is Said To Outperform DeepSeek-R1 And Surpass OpenAI's o1

Alibaba Qwen QwQ-32B

How high one can climb depends entirely on their willingness to ascend.

For a while, things were quiet—until everything changed. After OpenAI introduced ChatGPT, a battle erupted among tech companies. Those who saw the potential in commercializing Large Language Models rushed to develop and launch their own AI, rolling out upgrades at an astonishing pace.

In the world of chatbots, progress moves at lightning speed.

Initially, the AI arms race was centered in the West, led by tech giants like OpenAI, Google, Meta, and Apple, along with rising players such as Perplexity and Anthropic. But as the trend gained momentum, the East wasn’t about to be left behind. Seizing the opportunity, companies across China and beyond jumped on the bandwagon, eager to stake their claim.

As competition intensified, AI development shifted toward reasoning models.

OpenAI, for example, introduced the o1, designed to spend more time thinking before responding. Not long after, China’s DeepSeek introduced DeepSeek-R1, a rival with enhanced reasoning abilities and fewer restrictions.

Now, Alibaba has entered the fray.

After catching up rapidly with Qwen-2.5, it has unveiled its first-ever reasoning model: QwQ-32—a bold move that signals an even fiercer battle ahead.

According to Alibaba, it uses scaling reinforcement learning (RL) to enhance its AI model "beyond conventional pretraining and post-training methods."

Following DeepSeek-R1, which also uses RL to achieve a state-of-the-art performance by integrating cold-start data and multi-stage training, Qwen-2.5 is a reasoning model that can deep thinking to answer complex questions.

But uniquely, QwQ-32B is a model with only 32 billion parameters.

In comparison, DeepSeek-R1 boasts 671 billion parameters (with 37 billion activated). Since Large Language Models are known to be based on the "bigger is better" paradigm, QwQ-32B underscores the effectiveness of RL when applied to robust models pretrained on extensive world knowledge.

And since QwQ-32B has agent-related capabilities integrated into its reasoning model, the AI is also able to "think critically while utilizing tools and adapting its reasoning based on environmental feedback."

"These advancements not only demonstrate the transformative potential of RL but also pave the way for further innovations in the pursuit of artificial general intelligence," the team said on the project's GitHub page.

"Through this journey, we have not only witnessed the immense potential of scaled RL but also recognized the untapped possibilities within pretrained language models. As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI)," the team added.

It's worth noting though, that QwQ-32 competes directly with OpenAI o1-mini, and not the full o1. And at this time, OpenAI has already launched the successor of the o1 it calls the o3.

Read: Paving The Roads To Artificial Intelligence: It's Either Us, Or Them

Published: 
06/03/2025