Voice Recognition Is Finally Starting to Make Sense

åǥÁö

Voice recognition software has been around for a long time but, according to Computing1 magazine, the number of potential users far outnumbers the actual users.






Voice Recognition Is Finally Starting to Make Sense


Voice recognition software has been around for a long time but, according to Computing1 magazine, the number of potential users far outnumbers the actual users.

One reason is that voice recognition has developed a bad reputation. There was too much hype, and too little delivery on that hype. The technology simply didn¡¯t work as advertised. And yet, there still appears to be a huge potential market out there for it.

Despite some high-profile failures and the relatively primitive uses to which the technology is being put to date, such as answering questions while checking flight status with an airline, there are some impressive efforts under way to perfect the technology and bring it front and center in your life and your business.

As noted on News.com,2 Toyota just signed a deal with a company called VoiceBox to develop a completely new type of voice technology that can comprehend conversational speech rather than memorized commands. More and more these days, the family automobile is becoming the staging area for entertainment technologies that require a lot of button pushing.

For example drivers use cell phones, satellite radio, personal digital assistants, DVD players, iPODs and GPS-based navigational devices. The keypads, buttons, and click wheels are all monopolizing the driver¡¯s attention. Voice technology could free the driver to keep his hands on the wheel. That¡¯s why VoiceBox is building a voice search capability for XM Satellite Radio, which has more than six million customers.

Starting later this year, VoiceBox technology will allow those subscribers to ask for traffic conditions, sports reports, stock quotes, and of course, regular programming.

VoiceBox is also working with Johnson Controls, one of the largest suppliers for auto-makers. They hope to bring the iPod under voice control this year.

VoiceBox was founded because traditional systems, such as the airline phone systems, contain a limited dictionary of commands with which you can say flight numbers and words like ¡°Arrive¡± and ¡°Depart.¡± But you cannot ask those systems questions as simple as, ¡°Could you find me a better seat?¡± In addition, the existing systems often don¡¯t even recognize the commands they are supposed to incorporate when certain users say them. They have trouble with different accents and with background noise.

Bob Kennewick, who founded VoiceBox in 2001 with his brother Mike, is a computer science and economics professor at Harvard. His aim is to create a system, called a natural language system, which recognizes conversational speech ? and learns from the user.

The ideal consumer system would work like this: The driver could say, ¡°Let me hear Cisco.¡± In a car full of electronic gadgets, that could mean you want a quote on Cisco stock, you want to listen to the singer named Sisco, or you want to hear Johnny Cash¡¯s song, ¡°Cisco Spilling Station.¡±

The VoiceBox system will ask you which choice you want and remember what you selected. So if you initially asked for a quote for Microsoft stock and then asked for Cisco, it would give you a quote for Cisco stock.

Only a few companies are trying to solve the natural language problem at present, but the ones that get it right will find a large and eager market. For example, General Motors¡¯ OnStar system now has four million subscribers. Any time the driver needs help, all she needs to do is push a button, and help is on the way. But OnStar depends on a huge team of round-the-clock operators to answer calls.

GM already supplements the human operators with limited voice recognition. For example, if you say, ¡°Call home,¡± the cellular feature of the system will place a call to your home. The cost of full-time operators could be vastly reduced if OnStar had access to a natural language system.

Recognizing this and other needs, IBM, Microsoft, and others are putting big money into voice technology. Microsoft¡¯s system is already installed in Fiat cars to control mobile phones and music players. But that is still just a simple command system. IBM, which has been working on the problem for 30 years, has new software to run on the VoiceBox hardware. And Johnson Controls teamed up with VoiceBox to launch a product that uses voice commands to integrate and control Bluetooth-enabled phones, iPods, and PDAs.

In addition, IBM¡¯s latest version of its ViaVoice system will be introduced in Motorola¡¯s cable TV set-top boxes this year. According to Warren¡¯s Consumer Electronics Daily,3 this system, which will also be used in Motorola cell phones and XM Satellite Radio, is an incremental step toward natural voice recognition. With Motorola, IBM is also trying to develop a voice system that will allow computer users to manipulate content on the Web using voice commands alone.

However, voice technologies aren¡¯t just for entertainment. Seniors represent another big potential market for voice, because they may have poor vision and reduced dexterity due to arthritis. Companies like Sensory, Inc. have embedded their speech technologies in products that elders can control by voice, such as electronics, clocks, lighting, and other remote controls.

Sensory makes the RSC-4128 chip, which is the world leader in voice control hardware. According to a Business Wire4 report, Sensory¡¯s sales were up 70 percent in 2005 as a result of that device.

And in other, more serious applications, ScanSoft¡¯s voice recognition system, known as Dragon Naturally Speaking, is being used to reduce diagnostic reporting times by half in cancer treatment trials at King¡¯s College in London. The system converts dictated speech to text at 160 words per minute.

Given this long-term trend, we offer the following six forecasts:

First, there will be some serious disappointments in the short term. Despite advances, voice recognition is still a long way from true natural language. An executive of ScanSoft recently admitted that the creation of true natural language is still 50 years off. To understand what this means, consider this illustration: National Public Radio5 recently demonstrated the problems with Amtrak¡¯s reservation system on the air. During the reporter¡¯s attempt to schedule a trip, the system failed to recognize the word ¡°schedule,¡± one of its known commands, and took San Antonio to mean Hinton, West Virginia.

Second, despite those disappointments, workable speech software will remain the holy grail of computer interface technology, and a lot of companies will keep working to make it a reality. One reason is the money it could save. For example, every time a customer opts out of an automated call system and talks to a human instead, it costs the company from $5 to $15, according to Forrester Research. A workable voice system could save billions. So, in the next five to 10 years, expect to see a proliferation of usable, but not perfect, voice systems.6 You¡¯ll see them first in automobiles, and then in everything from customer service to home entertainment. Expect to see usable, voice commands for hand-held devices like Blackberries within the next five years to alleviate the problem of tapping those tiny keys.

Third, as new versions are brought on line and competition intensifies, you¡¯ll see these systems grow incrementally more and more sophisticated and work more seamlessly. The market for a speech interface for a PC, for example, is estimated to be 50 million users. Philips unveiled its SpeechMagic upgrade for professionals late last year. It¡¯s essentially a dictation system for legal, medical, and other professional users, and it works in 23 languages to make document creation more efficient. But Bill Gates told CNET recently that error rates still hadn¡¯t reached the ¡°magic threshold¡± that will make them better than a keyboard.7 Nevertheless, such systems will become more and more common as the early adopters work with them, and companies push them to converge on an acceptable level of accuracy.

Fourth, the ecosystem of companies in the voice industry is now extremely complex, but as the best technologies prove themselves and the winners emerge, it will gradually shake itself out. Don¡¯t be surprised if Microsoft, IBM, and Intel are at the forefront, but we¡¯ll be watching newer companies with great ideas, such as VoiceBox, that have a chance to win big, too. In addition, we¡¯ll be keeping an eye on some companies abroad, such as Sakhr, which makes software that translates Arabic documents or presentations into English in minutes rather than the days it takes a human translator today.

Fifth, the mystifying diversity of systems and proprietary standards will gradually converge as organizations insist on uniformity, ease of use, and compatibility with existing networks. The convergence will likely lead toward an Internet protocol, like so many other applications, such as IP telephony. The VoiceXML Forum, an industry organization that has approved a dozen protocols so far, is already scrutinizing voice platforms. But the VoiceXML format will be challenged by IBM¡¯s choice of a competing format, known as SALT. These standards will have to converge in order for customers to do such mundane tasks as getting the same bank balance when asking a Web site or their Palm Pilot.

Sixth, in the longer time frame of 20 to 50 years, true natural language systems will evolve in lockstep with technologies built for exponential amounts of data processing, whether through nanotech solutions or quantum computing. By that time, a new generation of people will come along who, far from finding talking to a machine awkward, will find it the most natural thing in the world. The familiar keyboard and mouse combination will be a rarity, as will the touchpads on many devices. Remote controls will disappear for music and entertainment devices, since they¡¯ll be capable of understanding conversational English. And numerous household functions will be carried out by voice command through a hand-held device from anywhere, such as adjusting the temperature of a home, starting a car, or setting a security system.

References List :
1. Computing, June 2, 2005, ¡°Networking ? Its Good to Talk to the Machine.¡± ¨Ï Copyright 2005 by VNU Business Publications, Inc. All rights reserved. 2. For information about new voice technology, visit the News.com website at:news.com.com/Talk+to+the+car+with+new+tech/2100-11389_3-6029403.html 3. Warrens Consumer Electronics Daily, January 25, 2006, ¡°IBM Seeks to Expand Use of Speech Recognition Software.¡± ¨Ï Copyright 2006 Warren Publishing, Inc. All rights reserved. 4. Business Wire, December 15, 2005, ¡°Speech Recognition Consumer Products Hit It Big in 2005.¡± ¨Ï Copyright 2005 by Business Wire. All rights reserved. 5. To listen to National Public Radios broadcast about speech recognition, visit their website at:www.npr.org/templates/story/story.php?storyId=4933584 6. Customer Inter@ction Solutions, March 2005, ¡°Saving Speech Recognition,¡± by Michael Chavez. ¨Ï Copyright 2005 by Technology Marketing Corporation. All rights reserved. 7. To access Bill Gates interview with CNet, visit their website at:news.com.com/2102-1016_3-5868792.html?tag=st.util.print