The First-Ever Commercial-Scale 'Diffusion-Based Large Language Models: Merging Diffusion With LLM

26/02/2025

Large Language Models (LLMs) are advanced AI systems designed to understand and generate human-like text. Trained on vast datasets, they can perform a variety of tasks, from drafting emails to answering questions.

Their architecture typically relies on transformer models, which process text in a sequential manner, predicting the next word based on the context of the previous ones.

Diffusion-Based LLMs (dLLM) on the other hand, utilize diffusion for the LLMs. Traditionally, diffusion models have been prominent in generating images, audio, and video by learning statistical patterns in data.

However, research has explored their application in natural language processing.

Inception Labs has introduced the first-ever commercial-scale dLLMs, which significantly improves models’ speed, efficiency, and capabilities.

Stanford University.

Inception, a Palo Alto-based company, was founded by Stanford computer science professor Stefano Ermon.

It all began when Stanford computer science professor Stefano Ermon saw the advancements of AI, and hypothesized generating and modifying large blocks of text in parallel was possible with diffusion models.

After years of trying, Ermon and a student of his achieved a major breakthrough, which they detailed in a research paper in 2024.

Stemming from this research, Emon founded the Inception and becomes its CEO. He was then joined by then two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, who co-lead the company.

Here, the team found that dLLMs achieve up to 10 times faster inference speeds and 10 times lower inference costs while unlocking advanced capabilities in reasoning, controllable generation, and multi-modal data analysis.

This is achieved because dLLMs can leverage GPUs much more efficiently.

By commercializing dLLMs, Inception hopes that it can help enterprises deploy intelligent agents and real-time decision-making systems at scale, setting a new standard for AI performance.

According to Ermon:

"AI today is limited because the core algorithm underlying generation is very inefficient, which makes scaling the most powerful models to real-world applications challenging."

"Just as Deepseek identified ways of reducing the costs of model training, we have developed approaches to make model inference vastly more efficient and accessible.”

Inception Labs founders (left-right): Aditya Grover, Volodymyr Kuleshov, Stefano Ermon.

Unlike traditional LLMs that generate text sequentially, the dLLMs approach, which uses the same technology as AI generators like Midjourney for images and OpenAI’s Sora for video generation, allows generative AI to simultaneously generates entire blocks of text.

With LLMs, “you cannot generate the second word until you’ve generated the first one, and you cannot generate the third one until you generate the first two,” Ermon said.

So here, dLLMs ability to parallel the process should result in a faster, more efficient generation and more precise control over output quality.

Due to how dLLMs require less resources to run, the efficiency of diffusion models also opens up possibilities for advanced reasoning.

At this time, reasoning models like the OpenAI o1 and o3 and others require a lot of computational power to think, dLLMs can power agentic apps using less effort.

The efficiency of diffusion models means that they run quickly even on edge computing devices, bringing AI from data centers to consumer devices.

This in turn can unlock more AI potentials for developers and enterprises alike.

According to Artificial Analysis, an independent AI measurement firm, which has benchmarked Inception’s dLLMs, found that the method is 10x faster than leading speed-optimized models like GPT-4o mini and Claude 3.5 Haiku.

A few months ago, we started Inception Labs, a new generative AI startup with a rockstar founding team.

At Inception, we are challenging the status quo for language generation. Our first results bring blazing fast speeds at 1000+ tokens/sec while matching the quality of leading… https://t.co/SeOZDyhDPX
— Aditya Grover (@adityagrover_) February 26, 2025

Inception’s roadmap includes launching models with several other technological advantages provided by diffusion modeling:

dLLMs can provide advanced reasoning capabilities by leveraging their built-in error correction mechanisms to fix mistakes and hallucinations.
dLLMs can provide a unified framework for processing multimodal data, making them more performant on multimodal tasks.
dLLMs can deliver control over output structure, making them ideal for function calling and structured data generation.

Inception was founded by professors from Stanford, UCLA, and Cornell—pioneers in diffusion modeling and cornerstone AI technologies, including flash attention, decision transformers, and direct preference optimization.

The company’s engineering team includes veterans from DeepMind, Microsoft, Meta, OpenAI, and NVIDIA.