PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Speech Technology Tech Sounds Better All The Time

 & Michael J. Miller Former Editor in Chief

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS
Windows Phone 7 voice test.jpg

When you think speech technology, odds are you think of HAL from 2001, or perhaps, more generously, R2-D2 and C3PO from the Star Wars movies. And of course, that level of speech recognition is still a long way off.

But as I walked around the SpeechTek 2010  conference last week, I was again reminded that speech recognition may not be at that level, but it's certainly good enough for a lot of applications today, and it's being readied for even more uses in the near future.

We often think of speech recognition in terms of applications like Dragon Naturally Speaking, but the biggest use for speech recognition today seems to be in the call center, with  interactive voice response (IVR) systems handling and routing simple queries, and analytics software helping companies search through their records looking for patterns. If you've ever heard "this call may be recorded for quality purposes," odds are some software with speech recognition is involved. And I know that most times I call my bank or credit card company, I go through an IVR system before I get to the right person. Indeed, the fact that the conference was held in conjunction with the CRM Evolution conference aimed at call centers is an indication that such applications still dominate the market.

All sorts of firms produce products that serve different parts of the market, with Nuance (which makes Dragon), Microsoft (through its TellMe Brand), and Loquendo probably the best known providers of the engines themselves.

Both Nuance and Microsoft have widened their offerings in recent years through acquisitions.

Microsoft's Grant Shirk, director of industry solutions, says the company is focusing on a "cloud-based platform" for speech, with its products used by companies such as Fidelity, UPS, and Avis.

But in the long run, it views speech as part of a "natural UI" that will combine speech, touch, and gesture recognition. Indeed, in a keynote speech at the conference, Microsoft Speech General Manager Zig Serafin talked about the transition to the "natural UI era"

For instance, Microsoft has integrated speech and gesture recognition into its Kinect for the Xbox; and speech and touch features together for Windows Phone 7. Indeed, Shirk gave me a great demonstration of using speech on a Windows Phone 7 device. You just press and hold the center button on the bottom of the phone, and you can say things like "Start Outlook." You can go into Bing and say things like "Find Italian Restaurants near me." Or just say the name of an airline and flight, and get the status.

The TellMe technology is part of the speech recognition that is already embedded in Windows 7, but in many ways, this technology seems to be moving beyond such obvious dictation, Shirk said. For instance, he noted how a new Voice Mail Preview function could transcribe voice messages and put the text into your e-mail inbox, with the person leaving the message not even knowing it.

Improvements in the technology are based in part on getting more samples, and in building the usage of existing products so that developers can gather more data. The company is also interested in building in semantic intent and context, so that such software does a better job in understanding what you mean, building on tools like the Powerset engine Microsoft recently acquired. The goal is to "stop transcribing and start understanding."

Nuance may be best known for Dragon Naturally Speaking -- still the best-selling dictation program, and one that keeps improving. But it also makes a wide variety of products aimed at other speech applications, which are often used in the telecommunications and financial services industries, including an on-demand version it recently acquired with BeVocal. The company says it has over 4000 deployments of customer care applications.

Laura Marino, senior director of product management, said the company is particularly looking at improving the grammar of a conversation, making software that is a "smart listener." She noted that such "adaptive grammar" makes it easier for the software to understand what someone means in a conversation.

Another area of research, she said, was "dialog strategies", making the applications more conversational: asking questions and responding to them. The company also talked about "natural languages" and is working on making the next generation of such applications closer to talking to a live agent. She noted how many people prefer going to an ATM rather than to a teller, so making such "self-service" applications work even better is a key focus.

Of course, the company is also looking to build on its wide variety of applications, with Dena Skrbina, senior director of solutions marketing,telling me that consumer speech applications were driving improvements in IVR speech, and vice versa. She said the use is broadening beyond customer care applications; to full problem-solving solutions, including such things as outbound messages, in systems that notify customers of changes or alerts via SMS, e-mail, or voice calls.

For instance, the company offers a visual display system that combines speech and visual displays for phone companies, so you can easily navigate your bill. One such system, Dena said, has sent more than 12 million alerts. The company counts Metro PCS and T-Mobile among its customers.

Personally, I've noticed a great deal of improvement in speech over the past few years, from dictation programs to mobile search to IVR solutions. And while no IVR system is as good as talking to a real live person, there are plenty of times when I'd rather get a quick answer over an IVR system than wait on hold for a live agent. Voice recognition has come a long way, but of course, it's still nowhere near as good as it looks in the movies.

Originally posted to Michael Miller's blog, Forward Thinking.

About Our Expert

Michael J. Miller

Michael J. Miller

Former Editor in Chief

Michael J. Miller is chief information officer at Ziff Brothers Investments, a private investment firm. From 1991 to 2005, Miller was editor-in-chief of PC Magazine,responsible for the editorial direction, quality, and presentation of the world's largest computer publication. No investment advice is offered in this column. All duties are disclaimed. Miller works separately for a private investment firm which may at any time invest in companies whose products are discussed, and no disclosure of securities transactions will be made.

Until late 2006, Miller was the Chief Content Officer for Ziff Davis Media, responsible for overseeing the editorial positions of Ziff Davis's magazines, websites, and events. As Editorial Director for Ziff Davis Publishing since 1997, Miller took an active role in helping to identify new editorial needs in the marketplace and in shaping the editorial positioning of every Ziff Davis title. Under Miller's supervision, PC Magazine grew to have the largest readership of any technology publication in the world. PC Magazine evolved from its successful PCMagNet service on CompuServe to become one of the earliest and most successful web sites.

As an accomplished journalist, well versed in product testing and evaluating and writing about software issues, and as an experienced public speaker, Miller has become a leading commentator on the computer industry. He has participated as a speaker and panelist in industry conferences, has appeared on numerous business television and radio programs discussing technology issues, and is frequently quoted in major newspapers. His areas of special expertise include the Internet and its applications, desktop productivity tools, and the use of PCs in business applications. Prior to joining PC Magazine, Miller was editor-in-chief of InfoWorld, which he joined as executive editor in 1985. At InfoWorld, he was responsible for development of the magazine's comparative reviews and oversaw the establishment of the InfoWorld Test Center. Previously, he was the west coast bureau chief for Popular Computing, and senior editor for Building Design & Construction. Miller earned a BS in computer science from Rensselaer Polytechnic Institute in Troy, New York and an MS in journalism from the Medill School of Journalism at Northwestern University in Evanston, Illinois. He has received several awards for his writing and editing, including being named to Medill's Alumni Hall of Achievement

Read full bio