(Credit: Jeffrey Hazelwood/PCMag Composite; Microsoft/Lenovo)
In my hands-on testing of the Windows Recall feature—which lets you search and page through your PC activities in a timeline, and take actions on the content with AI—I mentioned that much of its power comes from its ability to work in conjunction with Click to Do, another Copilot+ PC exclusive. Microsoft labels both as “Preview” features, but they’re available in the current Windows 11 release version, not just in beta builds.
In short, Click to Do can highlight text and images on your screen and offer quick actions based on them. The idea is to help you complete mundane tasks without having to think about what to do next or open multiple apps. And if you are still wary of Recall due to security concerns, the good news is that Click to Do works independently of it. Here’s how to get the most out of Click to Do.
Setting Up Microsoft Click to Do
As mentioned, Click to Do is exclusive to Copilot+ PCs, so you need one of the latest Windows computers with a built-in neural processing unit (NPU) to try it. The first step is to open the Settings app and enable Click to Do. You can find it in the Privacy & Security section or search for it directly.
(Credit: Microsoft/PCMag)Like Recall, Click to Do takes screenshots for analysis. But, unlike Recall, this happens only when you invoke the feature. Click to Do performs its analysis locally on your PC, without sending anything to Microsoft’s servers. Of course, it’s possible that one of the actions you choose could send information over the internet—for example, a web search based on your selection.
How to Initiate Click to Do
There are several ways to initiate Click to Do:
- Click a button within the Snipping Tool app
- Hold down the Windows key and click with the mouse
- Press Windows Key-Q
- Right swipe on a touch screen
When you activate Click to Do via any of the above methods, the cursor turns into a blue circle, teardrop, or vertical line, depending on whether it's over the background, an image, or text. As you move it around the screen, items under it get an outline to show that you can interact with them. One thing I love about Click to Do is how it looks: It sort of liquefies screen elements and makes them wobbly to draw your attention. As with nearly all actions in Windows, you can back out of the special cursor with a tap of the Esc key.
(Credit: Microsoft/PCMag)I found differences between the options for starting Click to Do: When I used the Windows key and mouse method, the Click to Do cursor didn’t always appear, especially if it was over Microsoft content like the Widgets panel. Windows key-Q always worked in my testing. When you choose this method, you get a search box at the top of the screen in addition to the modified cursor.
The touch screen method requires a little more clarification. You need to swipe in from the right edge of the screen. When I first read the instruction, I thought it meant swiping to the right on the screen. Once I got the gesture right, I saw the search box you get with Windows key-Q. With the Snipping Tool, you see a new choice for Click to Do in its top toolbar:
(Credit: Microsoft/PCMag)The search bar that appears with some of the above methods lets you search for any text on the screen, whether it’s actual text or within an image. It doesn’t work for anything that’s not visible, such as text and images further down on a web page. That’s important to keep in mind since you can’t scroll content once you activate Click to Do.
(Credit: Microsoft/PCMag)The feature also can’t identify the content of images like Copilot+’s Semantic Search tool. For example, when I entered “mountain” and “bird” in the search box, it didn’t find the photo of the mountain on the screen below. (For that, you can use Copilot Vision, which now works on any Windows 11 PC and in any app window, not just the Edge browser.)
What Can You Do With Click to Do?
After you see the wobbly, colorized Click to Do interface, you can simply click on an image or select text (whether it’s actual text or text in an image). You can also right-click on anything for a context menu of options. What you can do with Click to Do depends on what you click on. For example, if you click on some text in an image, you get choices for creating a bulleted list, copying the text, opening it in an app like Notepad or Word (or any app that accepts text), using the text for a web search, rewriting it with AI in a choice of styles, or summarizing it with AI.
(Credit: Microsoft/PCMag)I was successfully able to open text in Word, summarize it (though this was slower than online AI tools), and create a bullet list of text from Ambrose Bierce’s Write It Right I found on the Gutenberg Project website.
(Credit: Microsoft/PCMag)The fact that Click to Do works only with what’s visible on the screen can be a shortcoming with text selection. It means you can’t use the Summarize option on a long document that extends past the visible view. I also found that I could sometimes select only a few paragraphs. If you need to summarize an entire web page, you’re better off with the Copilot sidebar in Edge.
If you use Click to Do on an image, you get choices such as Blur Background, Copy, Erase Objects, Open With, Save, Share (using any apps that support Windows’ share sheet), Remove Background, or Visual Search With Bing.
You can open an image with any app on your system, but the ones Click to Do shows automatically make sense. All the first-party options worked without a hitch. In testing, for example, I opened an image from a web page in Microsoft Paint to remove the background. Only the WhatsApp connection didn't work when I tried to send an image via it. Although WhatsApp accepts pasted images, something in the code seemed to be blocking communication with Click to Do. That’s too bad, since it would save me the trouble of taking a screenshot and opening WhatsApp (or another messaging app) separately.
(Credit: Microsoft/PCMag)Click to Do vs. AI Tools From Apple and Google
Apple doesn't offer an AI tool for macOS that gives you info about and suggests actions based on items on your screen. Apple’s Visual Intelligence comes close, but it’s available only on iOS for now and requires you to take a screenshot first. It does let you take action on contents of the screenshot, however.
Google Text Capture feature in ChromeOS comes close to Click to Do. With Google’s feature, you long-press the Launcher button to see highlighted text items (whether actual text or text in images) on the screen, but Windows’ Click to Do works with more than just text and offers additional actions.
Text Capture, however, does a better job in one specific use case: It can bring numbers from an image into a spreadsheet in a way that resembles their layout in the image. When I tried the same with Click to Do, all the numbers appeared across the top row. A bigger advantage of Click to Do over Text Capture, however, is that you can choose any app on your system to open the content in question; ChromeOS restricts you to a limited set.
You could point out Click to Do's requirement for a Copilot PC as a downside. But Google’s Text Capture feature is available only on Chromebook Plus models, and Apple’s Visual Intelligence works only on the latest two generations of iPhones.
What About Copilot Vision?
Click to Do has some similarities with Copilot Vision. The latter works on all Windows 11 PCs, but it requires a trip to the cloud and doesn’t offer a menu of actions you can take. Still, you can chat with it about whatever is on your screen to get information. Unlike Click to Do, it can actually describe the contents of images. For example, it identified black-crowned night heron and dark-eyed junco birds in my photos. I plan to dive deeper into this feature in an upcoming article.
Better Than the Rest
The Click to Do feature isn’t perfect, as I’ve noted in places above, but it's the latest example of how Microsoft is taking the lead with AI helper tools in desktop OSes. I don’t expect Apple and Google Chrome to sit still, however, so keep an eye out for Visual Intelligence and Text Capture to evolve.


