(Credit: Microsoft)
Microsoft's Copilot Vision AI can analyze, summarize, and answer questions about what’s on my screen. Thanks to Copilot’s vision and audio capabilities, I can then engage in a voice conversation with the AI about what I see. On a mobile device, I can use my camera to identify and explain landmarks, objects, and other items around me. On a Windows PC, Copilot Vision can view apps, documents, files, websites, and any other window to analyze it. Sounds cool, but how does this help with specific tasks? Here’s how I use Copilot Vision and what I've discovered.
You can use Copilot Vision on a PC running Windows 11 or Windows 10. It can also be used through Microsoft Edge, but this limits the feature to just your web pages. Instead, you can use Copilot Vision from the Copilot Windows app built into the operating system. To use the feature on a mobile device, you'll need the mobile app. For the best experience, you'll also want a Microsoft account, since the AI is limited without one. I have a Microsoft 365 Family subscription with Copilot, so the AI is automatically accessible.
Copilot Vision works via Copilot Voice, through which you’re able to carry on a back-and-forth conversation using natural language. You can choose among eight different voices, each with its own unique accent and style. You're also able to change the speed at which the voice speaks and check a transcript of the conversation afterwards.
Copilot Vision on My Phone
To start, I’ll go over how I turn to Copilot Vision by using the Copilot app (iOS or Android) on my iPhone 16 Pro. My first step was to customize Copilot’s voice. For that, I opened the app, went to my profile screen, and selected Voice settings. I have a penchant for anything British, so I typically use the Wave voice, which speaks with a British male accent. I also keep the speed set to 1 since I don’t want the voice to speak too slowly or too quickly.
Back at the main screen, I can tap the eyeglasses icon to the right of the prompt to open my camera. I can then aim my phone at the object I want Copilot to analyze. At the same time, Copilot greets me, asking what’s new or how it can help me. If I’m in the mood, I may spend half a minute chit-chatting with the AI. Otherwise, I’ll get right to the point with my first request.
(Credit: PCMag / Microsoft)Identify an Object
I can use the app to identify a nearby object. For example, I can point my camera at a top hat, ask the AI to identify it, and then ask it where I can buy it (after showing it the label). The AI tells me that this type of hat can be found on Amazon, eBay, and the retailer’s own website. After I ask it to provide specific options, Copilot displays links to the hat on Amazon and other sites and offers me buying tips, which I could see in the transcript after the conversation ended.
(Credit: PCMag / Microsoft)Identify a Landmark
My wife and I traveled to Paris recently, and I needed help with some of the landmarks and buildings. In one situation, I used Copilot Vision for assistance with the famous Arc de Triomphe. Even though I was familiar with this iconic structure, I asked Copilot to identify it and provide me with its origin and history. I then followed up by asking if it was open on the day we were there and if a fee was required to go inside. Finally, the AI provided me with a website and phone number to get further details, which I could view in the transcript.
(Credit: PCMag / Microsoft)Translate Another Language
While traveling abroad, my wife and I like to eat in authentic local restaurants. Sometimes, that means the menu is in another language, but that’s not a problem with Copilot Vision. Like I can do with Google Translate, I pointed the AI at the menu and asked it to translate the selections into English. After it gave me a general overview of the dishes, I asked it to translate each specific item. Even cooler, Copilot’s French pronunciations were spot on.
(Credit: PCMag / Microsoft)Explain and Summarize Text
When I wanted help with a technical manual for an uninterruptible power supply that I recently purchased, I asked Copilot for help. For this, I opened the manual to a specific page, pointed my phone at the text, and asked Copilot to summarize it. The AI provided a very brief and general description of what I found. Crucially, I was able to ask follow-up questions to zero in on specific sections of the page.
(Credit: PCMag / Microsoft)Copilot Vision on My Windows PC
To use Copilot Vision on a Windows machine, make sure that the operating system and Copilot app are up to date. You'll also need to use the desktop app from the taskbar or Start menu, since the Copilot website is more for general conversations. Like with the mobile app, my first move is to customize Copilot's settings.
From my account screen, I can go to the Voice mode section and select a voice I want. Again, I chose the British Wave voice. I also turned on Listen to ‘Hey, Copilot’ to start a conversation to kick off conversations with my voice. Further down, I enabled Highlights and Quick View under Copilot Vision. The Highlights option lets me point to specific things on the screen with my mouse, while Quick View lets me trigger Copilot by pressing the Copilot key or Win-C on my keyboard.
(Credit: PCMag / Microsoft)Summarize an Article
I can ask Copilot Vision to summarize an article on a web page. For this example, I chose a Wikipedia article on time travel. I opened the article in Chrome—but any browser should work—I then segued to the Copilot app and clicked the eyeglasses icon to the right of the prompt. A pop-up menu showed a list of all my open apps and windows; I picked the one for Chrome, which had the article.
(Credit: PCMag / Microsoft)I asked Copilot to summarize the article, which it did. I then posed a few questions based on information in the article, such as “What are some of the earliest works about backwards time travel?” and “Can you explain how a wormhole might work to transport someone through time?” In each case, Copilot provided helpful answers, all of which I can view in the summary after the conversation is over.
(Credit: PCMag / Microsoft)Edit Images
AI has done wonders for photo editing software. I have an image that has a bright spotlight in it, which I wanted to remove. To do it, I opened the photo in Adobe Photoshop Elements. After triggering Copilot Vision, I chose the Photoshop Elements window. While it can't exactly do it for me, Copilot did help when I asked the AI how to remove the light. In response, it explained where and how to use the Healing Brush tool in the program to eliminate the light. It also offered to guide me through the process step-by-step.
(Credit: PCMag / Microsoft)Proofread a Document
I wanted to see if Copilot could proofread an article I had written in Microsoft Word. After activating Copilot Vision, I chose the document and asked the AI if it could read the article and alert me to any grammatical or spelling errors. In response, it caught all of the spelling errors and most, but not all, of the grammatical errors. Though it’s not an expert proofreader, it did offer me a good head start in helping me correct and improve the document.
(Credit: PCMag / Microsoft)View 2 Apps or Windows at the Same Time
You can also share two apps or files at a time, and Copilot Vision will cross-check them to answer your questions. In this instance, I shared my calendar and the New York Yankees team schedule to find dates where I’d be free to attend a game. I asked Copilot to find upcoming games between the Yankees and the Orioles and compare them with my calendar. In response, the AI suggested a couple of dates and even offered to find and book the necessary tickets.
(Credit: PCMag / Microsoft)


