PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Sorry Apple and Google, Copilot Vision Proves Microsoft’s AI Game Is on a Whole Other Level

Copilot can now see what’s on your screen, verbally discuss it, and help you use related apps. Apple's and Google's desktop OSes don't have anything similar.

 & Michael Muchmore Contributor

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS
(Credit: René Ramos, Microsoft; hiro-hideki/iStock via Getty Images)

Microsoft started adding generative Copilot AI features to Windows in September 2023, well before its OS competitors had anything similar. And although most of the company's recent, spiffy AI tools are exclusive to Copilot+ PCs, Copilot Vision works on every Windows 11 (and, shockingly, Windows 10) machine. This feature, which lives inside the Copilot app, lets the AI view whatever is on your screen and provide natural verbal assistance. The latest versions of ChromeOS and macOS can't match this capability, despite adding some AI features piecemeal.

Copilot Vision first appeared in the Microsoft Edge web browser, and you can read my mixed impressions of that iteration. While it's indeed helpful to converse with an AI about a website, Copilot Vision in Windows lets you do the same for any open app. Copilot Vision is also available within the Copilot app for Android and iOS, where it can chat about whatever you point your phone’s camera at, but I'm focusing on the Windows experience here. As of publication, the Copilot app for macOS lacks the Vision capabilities I describe here.  


Get Ready for Copilot Vision 

To use Copilot Vision, you first need to make sure that Windows 11 is up to date. Go to Settings > Windows Update and click the Check for Updates button. To speed things along, you can toggle the “Get the latest updates as soon as they’re available” option. You, of course, also need the Copilot app. If you don't, head to the Microsoft Store to install it. 

(Credit: Microsoft/PCMag)

You can use Copilot without signing in to a Microsoft account for limited interactions, but you miss out on several features, including Copilot Vision. Signing in also enables AI image creation, Copilot Voice, interaction history, longer conversations, and settings syncing. 

Other requirements: Copilot Vision is available only in the US; a Microsoft blog post says it’s coming to more countries, outside of Europe, soon; Copilot is still not available in Europe because of Microsoft’s adherence to the region’s Digital Markets Act (DMA).


How to Use Copilot Vision 

Start by opening Copilot, either by typing Windows Key-C or Alt-space bar (which opens the compact Copilot window). Alternatively, you can click on the Copilot icon in your taskbar. If you have a Copilot+ PC, you can simply press the dedicated Copilot key on your keyboard. In the Copilot app’s window, you should see an icon that looks like a pair of eyeglasses to the left of the text-entry box at the bottom of the window. 

(Credit: Microsoft/PCMag)

Once you click the eyeglasses, you see a list of all app windows currently running on your PC. Programs running with non-minimized windows take priority here, but if you have more than four open, you can scroll down to find them. 

(Credit: Microsoft/PCMag)

When you toggle one of the options for an app, a new element pops up at the bottom of Copilot’s window with highlighted eyeglasses and microphone icons. For tasks that involve more than one app, you have to press the initial eyeglasses icon again to add a second window for viewing—Copilot doesn’t let you enable two at the start. Your AI pal will then start talking with you, describing what’s in view on the screen. You can end the conversation at any time by clicking Stop or the X. 

(Credit: Microsoft/PCMag)

From then on, you can simply converse verbally with Copilot, asking what you need to know about the app in question. In testing, I asked Copilot how to get photos from File Explorer into Photoshop and then how to improve them in the photo app. Copilot has good knowledge of the process and the app (I found the same for Lightroom). Here’s a video showing my experience (pardon my weak webcam mic; Copilot’s voice is clear, loud, and better-spoken than mine). 

If you tell Copilot “Show me how,” you see a large pointer in the Copilot panel, which flies up and draws a box or circle around the relevant interface element. Microsoft calls these Highlights. In my experience, this feature didn’t always highlight the correct object, but here’s a case in which it got it right: 

After you stop your Copilot Vision session, you can see a transcript of the conversation in the Copilot app: 

(Credit: Microsoft/PCMag)

As with most AI tools, your results may vary, even with the same question or prompt. I got different responses when I asked for the same information multiple times. For example, it occasionally gave me instructions for Lightroom Classic rather than the newer version of Lightroom. Sometimes it paused, keeping silent for a few seconds, but this issue wasn’t severe enough to ruin the experience.

I like how generative AI tools (and Copilot in particular) let you tell them when they get something wrong. In such cases, they will recheck their information and correct themselves. So, when I responded that the instructions were for Lightroom Classic and I was using the newer Lightroom, Copilot apologized and gave the correct instructions. 


Neither ChromeOS Nor macOS Has an Answer

Windows' two big competitors don't offer anything that competes with Copilot Vision. Google gets somewhat closer with its Select to Search With Lens and Text Capture features, the latter of which lets you find information about or take limited actions on selected text in images. But those ChromeOS AI features don’t let you converse verbally with an AI about what you’re looking at on the screen to get interactive help. 

MacOS’s AI capabilities are limited to creating cartoon-like images, rewriting text, and summarizing emails and web pages; macOS Tahoe at least promises some improvements. Siri has become more conversational, but it can’t help you with what’s on your screen.  

As I’ve concluded after testing other Copilot features in Windows, Microsoft’s desktop OS comfortably leads competitors in AI features. And in this case of Copilot Vision, you don’t even need the latest hardware and software to take advantage of it. The same isn't true for the AI tools that ChromeOS and macOS do have. Of course, with the vast resources these companies are investing in AI, Microsoft's lead is far from safe. I eagerly await the competition ramping up.

About Our Expert

Michael Muchmore

Michael Muchmore

Contributor

My Experience

I've been testing PC and mobile software for more than 20 years, focusing on photo and video editing, operating systems, and web browsers. Prior to my current role, I covered software and apps for ExtremeTech and headed up PCMag’s enterprise software team. I’ve attended trade shows for Microsoft, Google, and Apple and written about all of them and their products.

I still get a kick out of seeing what's new in video and photo editing software, and how operating systems change over time. I was privileged to byline the cover story of the last print issue of PC Magazine, the Windows 7 review, and I’ve witnessed every Microsoft misstep and win, up to the latest Windows 11.

I’m an avid bird photographer and traveler—I’ve been to 40 countries, many with great birds! Because I’m also a classical music fan and former performer, I’ve reviewed streaming services that emphasize classical music.

Technology I Use

For everyday work, I use a good-old Dell tower with 16GB of RAM, a 12th-gen Intel Core i7 processor, and an Nvidia RTX 3060 Ti GPU that runs on Windows 11. I pair it with a 4K Lenovo ThinkVision P27u-10 monitor and a Logitech MX Vertical mouse. For offsite work, I use a 2024 Microsoft Surface Laptop with a Qualcomm Snapdragon X Elite processor. Camera-wise, I moved to mirrorless from a Canon EOS 80D with a Canon 70-300mm IS USM lens. I now have a Canon EOS R7 with a 100-400mm lens, but I miss my DSLR for several reasons.

In order of usage, the software I turn to most frequently is the Edge web browser, Slack, Adobe Creative Cloud, Microsoft 365, Firefox, Brave, and WhatsApp. I use the Windows Phone link app to see everything on my Samsung Galaxy S21 Ultra phone, which has excellent telephoto capability.

For fitness monitoring, I have a Fitbit Charge 6 and use an Anker Smart Scale P1. I’m also a streaming fan, so I subscribe to both Amazon Music Unlimited (especially for its Dolby Atmos content) and Qobuz (for its high-res sound quality and classical catalog). I recently added a Vizio 5.1 Soundbar SE, which sounds surprisingly good given its low price. To holler commands instead of using a remote control, I have the Amazon Fire TV Cube in the living room, which lets me verbally tell the TV what I want to watch. It hooks up to an LG B4 OLED TV. I have a Sonos One speaker in my kitchen that also ties in with Alexa, as does the Echo Dot 2 With Clock in my bedroom. For serious listening, I have B&W 601 speakers plugged into a Conrad-Johnson Sonographe amp and preamp, with a Cambridge Audio AXN10 streamer as source. For reading, I also have a Nook GlowLight 3.

Read full bio