Am I Talking to a Bot or a Human? OpenAI Eyes Wider Use of Voice Tech

(Photo by Dilara Irem Sancar/Anadolu via Getty Images)

Don't feel like calling in a dinner reservation? OpenAI is experimenting with ways to have AI do it for you. The ChatGPT developer is ready to bring its AI voice technology to third-party apps, which promises to unleash smarter digital assistants that can not only talk and listen to you in real time but also interact with the real world.

OpenAI's Realtime API aims to help third-party developers build speech-to-speech capabilities for their own apps. The API is based on OpenAI’s Advanced Voice mode, which is designed to hold human-like, natural conversations that outpace Apple's Siri or Amazon’s Alexa.

OpenAI now sees an opportunity to expand its voice capabilities to various third-party apps and services looking to move beyond traditional text-based questions and answers. For example, imagine talking to a customer service rep who sounds human but is actually an AI program. Or receiving language lessons from an AI "tutor" in an educational app.

With the API, OpenAI expects its technology to usher in an era of "agentic AI," where AI agents that can speak and see are commonplace, says Chief Product Officer Kevin Weil.

"2025 is gonna be the year that agentic systems finally hit the mainstream,” he said in a press briefing this week. “If we do it right, it takes us to a world where we actually get to spend more time on the human things that matter and less time staring at our phones.”

OpenAI demoed how the Realtime API—which uses its GPT-4o model—could change the way we use apps. For example, you could ask a mapping app to look up interesting restaurants and stores in a local city.

During the demo, the assistant was able to verbally respond to each question in about three to five seconds, a relatively low latency. You could also interrupt and redirect the assistant. But the standout feature was OpenAI calling in a food order on the user’s behalf.

“Great, thank you. I’d like to place an order for 400 chocolate strawberries, please,” the digital assistant told a shop. “Can you let me know when the delivery will arrive at the Cowell Theater at Fort Mason?” it added. “I’m super excited.”

Although the demo was impressive, it also wasn’t hard to imagine the same technology being used or abused. Might the Realtime API unleash a new era for robocalls or spam? (Google also tried this with Duplex a few years ago, with mixed results.)

“That’s one of the use cases we want to avoid,” said Olivier Godement, Head of Product, API at OpenAI, during the press briefing. The company has built safeguards into the API to flag and prevent malicious use cases, such as the AI pretending to be human. In addition, OpenAI vows to crack down on any third-party developers found violating the API's terms of service.

“The AI assistant is never taking actions,” Godement added. “The developer has to execute the action. The way it works is simply the [AI] model suggests a next step, and then it's up to the developer to verify.” In other words, the digital assistant can’t easily go rogue and violate the limits of the third-party app, just because the user asks.

The other issue is that the speech-to-speech capability is limited to only talking in six different voices, which should prevent it from deepfaking other anyone’s persona. In a statement, OpenAI added: "Our policies also require developers to make it clear to their users that they are interacting with AI, unless it's obvious from the context."

This means the voice capability will stop short of identifying itself as an AI or bot during all interactions. So far, the company’s tests of the Advanced Voice mode haven’t shown the need for the safeguard, given that it could drag down the experience. But Godement added: “Whenever we see abuse, we will adjust our policies.”

It’ll ultimately be up to third-party apps on how they implement the Realtime API, which has been designed to streamline building speech-to-speech experience for apps at a low-cost. OpenAI adds that it’s already serving 3 million third-party developers, including both small startups and large enterprises.

The Realtime API arrives today as public beta and its expected to roll out to all developers in the coming days.

About Our Expert

Michael Kan

Principal Reporter

My Experience

I've been a journalist for over 15 years. I got my start as a schools and cities reporter in Kansas City and joined PCMag in 2017, where I cover satellite internet services, cybersecurity, PC hardware, and more. I'm currently based in San Francisco, but previously spent over five years in China, covering the country's technology sector.

Since 2020, I've covered the launch and explosive growth of SpaceX's Starlink satellite internet service, writing 600+ stories on availability and feature launches, but also the regulatory battles over the expansion of satellite constellations, fights with rival providers like AST SpaceMobile and Amazon, and the effort to expand into satellite-based mobile service. I've combed through FCC filings for the latest news and driven to remote corners of California to test Starlink's cellular service.

I also cover cyber threats, from ransomware gangs to the emergence of AI-based malware. In 2024 and 2025, the FTC forced Avast to pay consumers $16.5 million for secretly harvesting and selling their personal information to third-party clients, as revealed in my joint investigation with Motherboard.

I also cover the PC graphics card market. Pandemic-era shortages led me to camp out in front of a Best Buy to get an RTX 3000. I'm now following how the AI-driven memory shortage is impacting the entire consumer electronics market. I'm always eager to learn more, so please jump in the comments with feedback and send me tips.

The Best Tech I've Had:

My first video game console: a Nintendo Famicom
I loved my Sega Saturn despite PlayStation's popularity.
The iPod Video I received as a gift in college
Xbox 360 FTW
The Galaxy Nexus was the first smartphone I was proud to own.
The PC desktop I built in 2013, which still works to this day.

Read the latest from Michael Kan

Read full bio