Listening Chips
bgcolor=black height=1>
Yahoo! is not affiliated with the authors of this page or responsible for its content.
Listening Chips
www.circuitcellar.com
CIRCUIT CELLAR
®
Issue 133 August 2001
1
ears back in May
of 1993, I wrote an
article called Talking
Chips (Circuit Cellar 34)
describing the then emerging digital
voice recorder ICs. Besides offering a
high-tech replacement for the bulky,
balky mechanical voice recorders of
yore, the innovation spawned entirely
novel applications, such as greeting
cards that speak your own personally
recorded message.
As you might guess, this month Im
covering a chip that can listen.
Theres definitely the potential for
inspiring a lot of exciting and unique
applications, some of
these are more obvi-
ous than others.
I think voice recog-
nition technology has
gotten a bad reputa-
tion because its
stereotyped as a magic
bullet designed to sup-
posedly put that
inspired hack of the
typewriter age, the
crusty but lovable
QWERTY keyboard,
out of its misery. Through the brute
force application of MIPS and mega-
hertz, progress has been made, but
chips and software cant yet achieve
the accuracy and speed required for
transcribing natural gab.
On reflection, replacing keyboards
may be one of those situations where
if it can be done, it will be done, and
then youll see if it should have been
done. As someone who types a lot, I
have a few observations.
First, when writing an article, typ-
ing is the least of my worries. The
real work is studying datasheets, fool-
ing with boards, trying experiments,
and so forth. The hardest of all is giv-
ing creative birth to the words I want
to say, not just typing them.
Even imagining a perfect voice
recognition system for my PC, Im
not convinced. Try this experiment.
Think of a sentence or phrase and
then type it while saying it aloud. As
someone who can type at a decent
rate, I can key in the words at nearly
a normal speaking cadence. Only by
slurring the words together in a blur
does speaking demonstrate more top-
end throughput. The human brain
demonstrates its formidable skill by
being able to parse such frenetic blab-
ber, but it drives automated recogni-
tion systems nuts.
Besides, have you ever given a long
speech or talked vociferously at a
party for hours on end? Its tiring. I
SILICON
UPDATE
Tom Cantrell
Listening Chips
y
Delving
into voice
recognition
and chips
that listen,
Tom takes a look at
the current state of
development. With
pioneer Sensory lead-
ing the way, he dis-
covers theres poten-
tial for designing
unique applications.
Photo 1When it comes to voice recognition, the Voice Extreme Toolkit rep-
resents a new high in ease of use and, at only $129, a new low in price.
Listing 1 This code demonstrates the VE C in action, running a demonstration of speaker-independent
recogni tion. The process boils down to pattern generation (
PatGenW) and then recognition (Recog) with a
level of confidence (
GetRecogLevel1).
//----------------------------------------------------------------
-
// OPERATION:
// After an initial BEEP the program loops forever, waiting for
// button presses.Press A for a full prompt, B for a short
//prompt.
// Respond to the prompt by saying one of the 6 possible words.
// The program tells you what word you said or announces an
//error.
//
// NOTES:
// The program is linked with both a SPEECH and a WEIGHTS data
//file.
// This program calls the PatGenW and Recog functions and checks
// their returns. It illustrates the use of the confidence level
// stored in the WEIGHTS file and special processing for NOTA (
// None Of The Above ) recognition.
//----------------------------------------------------------------
-
#include <ve.veh>
#include <speech\sidemo.veh>
#include <weights\si6.veh>
#define TalkTable VPsidemo3
main()
{
uint8 prompt;
uint8 code;
uint8 result;
BEEP;
while( 1 )
{
GreenOn; YellowOn;
// Get a key press, then decide which prompt to use
prompt = MSG_BEEP;
do
{ if ( ButtonAPressed )
{
prompt = MSG_LONG;
YellowOff;
}
else if ( ButtonBPressed )
{
prompt = MSG_SHORT;
GreenOff;
}
} while ( prompt == MSG_BEEP) ;
// Say the prompt, wait for a response and try to recognize it
Talk( prompt, &TalkTable );
RedOn;
result = PatGenW(0, &WTSI6);
RedOff;
if ( result )
{
BEEP; BEEP;
DebugH4( result ); // Announce error code, if any
}
else {result = Recog( &WTSI6 ); // Try to recognize response
if ( GetRecogLevel1() < CONFSI6 )
2
Issue 133 August 2001
CIRCUIT CELLAR
®
www.circuitcellar.com
presume it wouldnt be long before
folks would get up in arms over the
other CTS, carpal tonsil syndrome.
Overlooked in the dubious quest to
kill QWERTY is the fact that there
are less glamorous (but imminently
practical) voice recognition applica-
tions that do become feasible with
incremental advances in technology.
Besides such likely candidates as car
phones, automated phone systems,
and toys, I can imagine a lot of handy
(make that no hands) products.
For example, when using a scope or
logic analyzer, I invariably end up
needing to punch a switch or twist a
dial even as both hands are frozen
probing the rats nest. It would be
great if I could just say, for example,
External Trigger Channel 2 instead
of the more flowery phrases I find
myself using in that situation.
IN THE REALM OF THE SENSORY
Although it hasnt reached house-
hold name status, in the relatively
new field of voice recognition,
Sensory can be considered one of the
pioneers. Theyve been around for
years, slowly but surely percolating
their technology into emerging appli-
cations one by one.
Ive kept in touch with Sensory and
monitored their progress, but held
back on writing an article. The fact is,
with ASIC- and ROM-based custom
silicon underpinning a focus-accounts
marketing strategy, what they had to
offer was only suitable for a few big
outfits like Sony, VTech, and Uniden.
But now, after successfully establish-
ing their place, Sensory is moving to
expand the market with low-cost
standard chips suitable for a broad
range of applications from customers
big and small.
Enter the Voice Extreme Toolkit
(see Photo 1) which, at only $129, is
not only ideal for prototyping and
demos, but is also suitable for moder-
ate volume applications.
The kit is wrapped around a special
version of Sensorys RSC-364 speech-
recognition chip called the VE IC. The
ROM on the chip is factory-programmed
with a C-like language interpreter and
memory manager designed to work with
a commodity external flash memory
(Continued)
www.circuitcellar.com
CIRCUIT CELLAR
®
Issue 133 August 2001
3
chip. Note that a ROM-less version,
RSC-300, is available (see Table 1).
The external flash memory is used
to store an applications particular
vocabulary, specifically the templates
and weights that lie at the heart of
Sensorys recognition technology.
There are two sources for the vocabu-
lary, and the choice is determined by
the specifics of the application.
For speaker-independent applica-
tions, Sensory can draw from a library
of common words in the major lan-
guages or provide service to generate a
custom (i.e., atypical language) vocab-
ulary. By contrast, speaker-dependent
apps rely on training (i.e., writing
flash memory) by the end user in the
field.
An interesting tweak of speaker-
dependent recognition is known as
speaker verification. The latter is kind
of the inverse of the former. Instead of
recognizing a word from a predefined
vocabulary spoken by a known per-
son, verification recognizes which
speaker from a predefined group is
saying a known word.
A specific application might use a
combination of recognition modes.
For instance, a security system could
recognize a particular users voice
(speaker verification) and then, know-
ing his identity, determine his specific
password (speaker-dependent) before
accepting generic commands (speaker-
independent).
Other Sensory variations on the
recognition theme include word spot-
ting and continuous listening. Word
spotting finds trigger words in contin-
uous speech, so Please open the
door could be recognized as open
door. To reduce false triggering com-
plications, use words with more sylla-
bles or include more than one word,
like a brief phrase.
Because there is a slight delay
between recognition of the first
word in a multi-word trigger and
listening for the following word,
I recommend that you try estab-
lishing a scheme that uses trig-
ger words that are naturally sep-
arated by other speech or other-
wise wont easily run together.
Note that word spotting only
works with speaker-dependent
recognition.
Continuous listening is similar,
except that it waits for a specific
isolated phrase (i.e., only open
door would be recognized), with
pauses delineating each word.
Although not as powerful as word
spotting, continuous listening does
have the advantage of working with
both speaker-dependent and independ-
ent recognition modes.
LIP READER
How does the chip work its magic?
The secret advantage for the VE IC
isnt so much what it does, but the
fact that it needs only a middleweight
micro to do it (see Figure 1). O