[tt] NYT: Speak Up, a Computer Is Listening

Premise Checker <checker at panix.com> on Thu Aug 7 20:20:10 UTC 2008

Speak Up, a Computer Is Listening
New York Times, 8.8.7
http://www.nytimes.com/2008/08/07/technology/personaltech/07pogue.html

By DAVID POGUE

Of all the high-tech fantasies that sci-fi movies tantalize their
escapist audiences with, surely that bit about giving your computer
spoken orders is one of the most alluring. Ever since "Star Trek,"
we've dreamed of being able to say, "Computer, display all known
sources of dilithium crystals in the Kraxon Nebula!"

So far, the closest we can get is strapping on a headset and
dictating, using a program like Dragon NaturallySpeaking to do the
typing. This software is great for anyone who can't type or doesn't
like to. And it lets you speak the names of menu commands and
"click" links on a Web page.

But that's not the same as telling the computer what to do in
conversational English.

NaturallySpeaking 10, available Thursday, takes some baby steps in
the right direction. It doesn't turn your computer into the "Star
Trek" mainframe; it doesn't know what you mean by, for example,
"Make this document shorter and funnier." But in its timid,
conservative way, it takes voice control unmistakably closer to that
holy grail of computing.

NatSpeak's principal mission, though, is to type out, into any
Windows program, whatever you say. And in version 10, its maker,
Nuance, claims to have eked out yet another 20 percent accuracy
improvement.

I installed the program, donned the included headset and clicked
"Skip initial training." (In the early days of speech recognition,
you had to read a 45-minute sample script to train the program to
recognize your voice. Today, the software is so good, you can skip
the training altogether.)

As a quick test, I read aloud the first 1,000 words of
"Freakonomics" into Microsoft Word. Impressively enough, NatSpeak
effortlessly transcribed words like "Ku Klux Klan" and "Punic war."
It did, however, mistype seven easier words ("addition" instead of
"edition," for example, and "per trail" instead of "portrayal").
Accuracy tally with no training: 99.3 percent. Not too shabby.

Then I tried a second test: I read one of the five-minute training
scripts (a Kennedy speech), which is recommended for even better
initial accuracy. I again read the first 1,000 words of
"Freakonomics," and the program mistyped five words. Accuracy this
time: 99.5 percent.

In both cases, the number of spelling mistakes was zero. People who
use NaturallySpeaking never make typos, only wordos.

As you correct the mistakes with your voice -- a speedy, streamlined
procedure -- the program learns. Whether you skip initial training
or not, accuracy inches toward perfection over time.

One way that Nuance has improved accuracy is by acknowledging, for
the first time, that not everyone speaks alike. Version 10
recognizes eight accents: general (none), Australian, British,
Indian, Great Lakes (Buffalo to Chicago), Southeast Asian, Southern
United States and Spanish. If you don't specify, the program will
identify you automatically.

Isn't that somehow politically incorrect? Should a software program
treat you differently depending on how you sound?

Ah, the heck with it. It's dictation software. A little stereotyping
can go a long way.

Speed is another virtue in version 10. The program still waits for a
pause in your talking before it types, so that it can use context to
choose, for example, the correct homonym (there/they're/their). But
that waiting period has been halved; text appears almost
instantaneously at each pause.

Second -- and here's where things start to get Star Trekky -- the
program understands more "natural language" commands.

For example, italicizing something you've already typed, say, the
phrase "gas prices," used to require three separate commands. First,
"Select gas prices." Then, "Italicize that." Finally, to move your
insertion point back where you stopped, "Go to end of document."

In version 10, a single command does the trick: "italicize `gas
prices.'" The program makes the change and returns to where you
stopped, all in a blink. The same trick also works with the verbs
"bold," "underline," "delete," "cut" and "copy." (Yes, "bold" is a
verb now.)

You can speak a series of new Search commands, beginning with
"Search computer for ...," "Search the Web for ...," "Search e-mail
for ..." and so on.

For example: "Search maps for Chinese restaurants near Hoboken." Or
"Search Wikipedia for Bay of Pigs." Or "Search images for Gwyneth
Paltrow." These shortcuts work 100 percent reliably and do truly
save you time and typing. Next version: more of them, please.

And now, the NatSpeak Frequently Asked Questions:

"Does NaturallySpeaking work on a Mac?" Yes, but only when the Mac
is running Windows and you're using a U.S.B. headset adapter. It
works fantastically in Boot Camp and fast enough in VMware Fusion,
an emulator program.

Of course, it might be simpler just to buy MacSpeech Dictate, a Mac
program that uses the same Dragon recognition technology. The
current version is fast and accurate, but it lags behind NatSpeak in
features and power; it doesn't even let you make corrections by
voice, and therefore the accuracy never improves. But a 1.2 version,
with voice correction and voice spelling, is in testing now.

"Can I transcribe interviews with it?" No. NatSpeak knows only one
person's voice: yours. It also requires a clean audio signal, like
the one from a headset mike half an inch from your mouth.

"Can I dictate with a wireless Bluetooth earpiece?" Yes. In fact,
version 10 greatly expands the number of compatible earpiece models
(18 so far, listed at nuance.com). Accuracy may take a hit, though.

"Can I dictate into a pocket recorder and transcribe it later?" Yes.
The setup is more involved, though: only some recorders are
compatible, and you have to record 15 minutes of training.

"Doesn't Windows Vista come with speech recognition?" Yes, and it's
really good -- quite similar to NatSpeak, actually. But Nuance says
that, oddly enough, Vista has had virtually no effect on NatSpeak
sales.

I'm guessing that obscurity is part of the reason; most people
aren't even aware that Vista offers such a feature. Vista doesn't
come with the required headset, either. Nor does the Vista version
offer the same accuracy, features or power of NatSpeak, and it isn't
available in other languages (French, Italian, German, Spanish,
Dutch and so on).

NatSpeak is available in a number of versions. The Standard edition
($100) has the same accuracy as the others, but it's just for
bare-bones dictation.

To get the more advanced goodies described in this review -- the
natural-language commands, Bluetooth mikes and recorders -- you need
the Preferred edition ($200). It also lets you set up voice macros
that type out boilerplate text. For example, you can say, "Buzz
off," and it will type: "Thanks for thinking of me! Unfortunately,
I'm afraid I'm unable to accept your kind offer at this time."

There are also medical and legal editions ($1,600 and $1,200,
yikes), as well as a Professional edition ($900) for corporate
administrators who want to manage many NatSpeak installations from a
central server. The Pro version also recognizes natural-language
commands for Microsoft Outlook, like "Send e-mail to Mom" or
"Schedule a meeting with Barack Obama and John McCain."

Apart from Vista, NatSpeak really has no competition. Philips has
dropped out of the American market. I.B.M.'s own ViaVoice hasn't
been updated since 2003, and its sole distributor is, get this,
Nuance.

Maybe that's why Nuance makes only small, confident changes from one
version of NatSpeak to the next. Without any rivals, why add bells
and whistles that risk mucking up the program's virtues?

As a result, existing NaturallySpeaking owners can usually afford to
skip a generation between upgrades. Version 10 is a healthy leap
ahead of version 8, but version 9 owners shouldn't feel compelled to
upgrade.

And now, if you'll excuse me, I have some real work to do: "Search
maps for dilithium crystals near New York City. ..."

E-mail: pogue at nytimes.com

More information about the tt mailing list