Background

With 'Copilot Vision,' The AI From Microsoft Can Now 'See' What Users Do On Windows

Microsoft Copilot Vision

At first, large language models (LLMs) felt like a delightful novelty—chatbots that could pen essays, debug code, or flirt back in Shakespearean verse.

But beneath the charm was something far more disruptive. When OpenAI launched ChatGPT to the public in late 2022, the world didn’t just notice—it paused. That moment marked the beginning of a new era, triggering an AI arms race that sent tech giants and startups alike hurtling toward the future of cognitive automation.

Today, LLMs have evolved far beyond mere text generation.

They now possess the power to create and manipulate images, videos, audio, and more—all tailored to the user’s intent, and often with uncanny precision. They’re not just answering questions; they’re becoming full-fledged creative collaborators.

Microsoft, the tech titan that didn’t just join the LLM rivalry with Copilot, is already deeply embedding the technology across the Microsoft ecosystem—from Windows and Office to Azure and GitHub—bringing the intelligence of LLMs directly into the tools millions use daily.

And this time, Microsoft is giving Copilot another useful superpower.

Microsoft has upgraded Copilot on Windows 10 and 11 to not just answer questions—but to see what's on the screen to be able to guide users visually.

According to a blog post announcing the feature, users can trigger the AI using the glasses icon in the Copilot app. Users can then opt in to share one or two app windows or browser tabs, enabling AI to provide real-time support and insights based directly on what’s displayed

Copilot Vision can view up to two apps simultaneously, allowing it to cross-reference information and deliver richer context. For example, users can have a travel itinerary open alongside their packing list—and Copilot can spot missing items or suggest improvements. This multitasking capability aims to reduce the need for tedious switching between windows

The Highlights feature empowers Copilot to not only talk—but visually indicate UI elements.

By asking “show me how,” Copilot overlays guidance—pointing, highlighting, drawing attention—helping users follow along effortlessly.

It’s especially helpful in apps like Photoshop, Word, or Settings, transforming vague instructions into clear, on‑screen direction

"This update brings Copilot even closer to being a true companion, with a deeper understanding of your goals and the ability to provide clear, step-by-step guidance to help you accomplish them," said Microsoft.

As for privacy, Microsoft said that Copilot Vision works only when users have explicitly allowed it to run. What this means, there is no continuous monitoring. It's all up to the users to choose what to share, and that logging is limited to Copilot’s responses (not the screen content)

Microsoft further assures that shared information isn’t used to train models.

Initially available in the U.S. for Windows 10 and 11—and set to roll out soon across other non-European regions—Copilot Vision on Windows transforms Copilot into a virtual companion, offering an extra pair of eyes over the user’s digital workspace.

Whether its about juggling between files, apps, or any on‑screen content, Copilot can now analyze and assist whenever users choose to share what's visible 

This upgrade marks a significant leap from its earlier days.

Before, Copilot Vision lived only inside Microsoft Edge, capable of analyzing only web pages. Now, through the native Copilot app, everything users open on Windows—documents, creative software, games—is within its reach, as long as they opt in and select the windows to share.

Published: 
12/06/2025