I'm back in Boston now looking at some notes (on a napkin, no less) and a few business cards... In no particular order...
On Monday morning in the AVIOS track, Yoon Kim, CEO of Novauris, gave a presentation on their technology. Novauris appears to have recruited core members for the old Dragon Systems team (both US & UK). The interesting things I heard were their approach to recognizing long free-form speech inputs (more than a few seconds) by combining HMM acoustic and phonetic models with text search technology. They also couple information forward and backward, for example using acoustic decoder error models to select multiple data sets to feed to later stages.
On Monday afternoon, Mark Clements, Professor at Georgia Tech and co-founder of Nexidia, gave an interesting talk on language identification. Not surprisingly, the front end of a language identification system is similar to that of a speaker-independent speech recognition system using HMM processing to identify acoustic and phonetic elements in the unknown audio sample. If you were to separately test an unknown speech sample against models trained for English and for French you could rapidly determine if the phonetic elements in the unknown speech were those of an English speaker or a French speaker. Of course, if there were 20 or 100 potential languages, it would be computationally expensive to run each speech sample against 20 or 100 different recognizers.
Here's the trick I hadn't thought about. If you look at the errors when running French, German and Chinese samples against a recognizer trained on English models, you get distinctive error signatures. These signatures, from the English-trained recognizer, are a very strong indicator as to whether the sample is French, German or Chinese. Using this approach, the Nexidia team has built a language identifier which uses only 5 to 7 language specific models to identify samples in 20 or more languages with very high accuracy.
Finally, from my Tuesday evening discussion with Skip Cave of Intervoice, comes a recommendation to check out Language Computer Corporation's open domain question answering software that extracts answers to free form questions from heterogeneous text databases. As described by Skip (whose opinions I respect!), there was some amount of domain tweaking required (to deal with specific corporate knowledge areas, like HR or sales management), but since then the LCC software has been producing real, useful answers to arbitrary questions from technical and non-technical people.
Comments