With Gen-4.5 Image-To-Video, Runway Wants To Raise The Bar For AI-Generated Video

The battleground in AI has quietly shifted during these intense months from text and image generation to the far more demanding realm of video.

What started with OpenAI's launch of ChatGPT saw countless tech companies channeling massive resources into ever-more-capable large language models. Yet in the specialized domain of video generation, a nimble startup has proven that focused expertise and deep technical mastery can outpace sheer scale and marketing buzz.

While heavyweights like Google and OpenAI continue to grab headlines, Runway, a video-first company, had landed a decisive strike when it released Gen-4.5 model.

It achieved 1,247 Elo points on the Artificial Analysis Text-to-Video leaderboard, surpassing heavily hyped offerings from larger players, including Google's Veo 3 and OpenAI's Sora 2.

When it was released, Gen-4.5 quickly became the world's top-rated video model, emphasizing breakthroughs in visual fidelity, motion realism, prompt adherence, and precise creative control that produce cinematic, highly realistic outputs.

Now, Runway unleashed an Image to Video model for Gen-4.5, which steps things further.

Thank you all for the patience! We took our time to make sure it was done correctly. This has been the most requested feature. The best video model in the world. Now with image support. Enjoy! https://t.co/NuSh8K8WCC
— Cristóbal Valenzuela (@c_valenzuelab) January 21, 2026

Gen-4.5 builds directly on the foundation of earlier models like Gen-4, retaining their speed and efficiency while delivering substantial leaps through improved pre-training data efficiency and advanced post-training techniques.

It excels at understanding complex physics: objects move with believable weight, momentum, and force, while handling realistic human motion, fluid dynamics, subtle textures like hair and fabric, and cause-and-effect relationships in scenes. The model supports detailed, sequenced instructions within a single prompt, allowing creators to orchestrate intricate camera choreography, scene compositions, precise event timing, and atmospheric shifts.

Particularly relevant for image-to-video workflows, Gen-4.5 includes robust image-to-video capabilities alongside its strong text-to-video mode.

Users can animate a static reference image with a descriptive prompt, bringing portraits, environments, or compositions to life with consistent characters, coherent lighting, stable expressions, and dynamic storytelling.

This mode emphasizes world consistency, ensuring subjects, objects, and styles remain coherent across motion. Additional control features like keyframes and video-to-video are in the process of being rolled out to Gen-4.5, expanding its versatility for filmmakers, advertisers, and creators who need granular direction over every element.

Videos generated with Gen-4.5 typically run at 720p resolution and 24 frames per second, with outputs often described as near-indistinguishable from real footage in many scenarios, especially for complex movements, full-body synchronization, and facial expressions.

Runway maintains accessible pricing across subscription tiers, making this frontier-level quality available to creators at various scales without the steep barriers some enterprise-focused rivals impose. Access rolls out gradually through Runway's web app, with text-to-video and image-to-video modes currently live and further inputs arriving soon.

Despite these strengths, Gen-4.5 isn't without limitations.

The new Gen 4.5 update feels like an incredible unlock, especially for narrative work and consistency across scenes. Whenever I test new models internally, I find that the best way to evaluate performance is to use them to build a story. Here’s one I’ve always wanted to see… pic.twitter.com/6KiLoNipHK
— Cristóbal Valenzuela (@c_valenzuelab) January 21, 2026

Audio generation remains absent for now. What this means, videos are silent, requiring post-production sound addition. Unlike competitors, like Veo 3, or Sora 2, or Grok Imagine, generation lengths are also constrained (often around 5-10 seconds per clip, with some chaining possible).

And while prompt adherence is exceptional, intricate chain-reaction physics or extremely long sequences can still produce occasional artifacts, inconsistencies, or minor causal errors.

Credit-based pricing means heavy users burn through allowances quickly (around 25 credits per second), and full access to advanced controls or higher resolutions may depend on subscription level.

Early rollout phases have meant not all features are immediately available to everyone, and the model's permissive creative freedom invites the same ethical debates around deepfakes and misuse that affect the broader field.

Regardless, in this escalating video AI race, Runway's Gen-4.5 demonstrates how specialization can deliver outsized impact, forcing even the biggest players to respond.

For creators seeking the current edge in realistic, controllable short-form video, especially from images, it's a compelling new benchmark.

Published:

22/01/2026