QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Первый в Украине фестиваль тестирования

Introduction to Speech Recognition Software

testing

Roman Gorin

Киев 2016

About me

• Senior Technical Leader – Testing@ Delphi LLC http://udelphi.com

• 12+ years in Speech Recognition Testing• 6+ years as QA Team Lead• Main Product: Nuance Dragon Medical

http://www.nuance.com/for-healthcare/dragon-medical

• https://telegram.me/DJ_ZX• Facebook: rgorin.zx

http://udelphi.com/

http://www.nuance.com/for-healthcare/dragon-medical

https://telegram.me/DJ_ZX

Киев 2016

What it is

Киев 2016

Where used

• Nuance Dragon Family• Dragon Pro• Dragon Medical• Dragon for Mac• Dragon Anywhere• Etc

Windows Speech Recognition

Google Voice Search

Киев 2016

Where used

Personal assistants• Siri• Cortana• Google Now• Facebook M, etc

Car systems

Киев 2016

Where used

Smart Home assistants• Amazon Echo• Google Home• Zenbo• Homer, etc.

• Automated Call Сenters SWand more

Киев 2016

Where used: ViV AI (unreleased)

Киев 2016

Basic Principles• Capture audio

• Separate speech from other types of sounds (esp. noise)

• Compare speech audio with known patterns of text<->audio match

• Analyze language specific model

• Perform actions (type text, execute command) based on collected data

Киев 2016

Generic structure of how SR worksMain speech recognition models

(based on Wiki)

• Hidden Markov models• Dynamic time warping (DTW)-based speech

recognition• Neural networks

• Deep Feedforward and Recurrent Neural Networks

Киев 2016

Testing areas• Engine and Language Modelling (usually on recognition server side)• UI• Hardware• Deployment• Adaptation• Recognition and Text Editing• Language specificetc

Киев 2016

Testing areas: Hardware• Mobile HW

• Internal mic (notebooks/tablets)• Noise cancelling mic

• Sound card and drivers compatibility

• System Requirements compliance• HW Dependency• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for

Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)

Киев 2016

Testing areas: Hardware

• Mics and recorders (samples from nuance.com store)

• Special bundled HW for Professional*Nuance PowerMic *Philips SpeechMike

Киев 2016

Testing areas: Deployment

• Platform• Client OS (Desktop/Mobile)• Server OS for Client app• Server OS for Cloud/Remote app

• Azure Cloud• Amazon Cloud• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)

• Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access)

• Citrix XenApp and XenDesktop/Thin and Thick clients• VMWare Workstation and Horizon• Oracle VirtualBox• Microsoft Remote Desktop/Terminal Services

Киев 2016

Testing areas: Adaptation

• Predefined language patterns• Statistical modelsA statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.

• “Part of speech” detection

• Sound specific patterns• Person-specific

• How person pronounce words and sounds• How person construct sentences• Pronunciation speed

Киев 2016

Testing areas: Recognition and Commands control• Initial recognition tests

• Turn app into “listening mode”

• Basic commands (“what I can do”)• Extended commands (app-type specific)

• Non strict commands (pseudo-AI)• Search commands• 3rd party Apps specific commands/3rd party SW compatibility

• Dictating into app default text controls (if supported)• Dictating into 3rd party supported and unsupported apps• Transcribing prerecorded audio

Киев 2016

Testing areas: Recognition and Text Editing(sample from PCWorld/Nuance)

Киев 2016

Testing areas: Languages and Accents• Different accents (UK English, US English, Australian English, etc)• Issues with speaking• Language-specific sounds

• Homophones (French)• Umlauts (German)• etc

• Language specific syntax (using commas, periods, exclamation marks, etc)

• Similar or close pronunciation words (fr. voux, voi, vu, etc)• Hieroglyphs (Chinese, Japan, etc)

Киев 2016

Testing areas: Other stuff

• Audio codecs• Traffic consumption (for cloud or remote access apps)• Memory and CPU consumption• Response time and cancelling recognition

Киев 2016

Enterprise Recognition (based on Nuance.com info)

Киев 2016

Enterprise Recognition (based on Nuance.com info)

• Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com

Киев 2016

Киев 2016

Links• https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx• http://www.explainthatstuff.com/voicerecognition.html• http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/• http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.ht

m• http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software• https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking• https://en.wikipedia.org/wiki/Speech_recognition• https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp• http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html• http://copia.com.au/medical-speech-recognition/

https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx

http://www.explainthatstuff.com/voicerecognition.html

http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/

http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm

http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm

http://www.nuance.com/for-individuals/by-product/dragon-accessories

https://en.wikipedia.org/wiki/List_of_speech_recognition_software

https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking

https://en.wikipedia.org/wiki/Speech_recognition

https://en.wikipedia.org/wiki/Language_model

http://www.pcmag.com/article2/0,2817,2464719,00.asp

http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html

http://www.oxygen.lcs.mit.edu/Speech.html

http://copia.com.au/medical-speech-recognition/

Education

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика