22
Киев 2016 Первый в Украине фестиваль тестирования Introduction to Speech Recognition Software testing Roman Gorin

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

  • Upload
    qafest

  • View
    158

  • Download
    1

Embed Size (px)

Citation preview

Page 1: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Первый в Украине фестиваль тестирования

Introduction to Speech Recognition Software

testing

Roman Gorin

Page 2: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

About me

• Senior Technical Leader – Testing@ Delphi LLC http://udelphi.com

• 12+ years in Speech Recognition Testing• 6+ years as QA Team Lead• Main Product: Nuance Dragon Medical

http://www.nuance.com/for-healthcare/dragon-medical

• https://telegram.me/DJ_ZX• Facebook: rgorin.zx

Page 3: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

What it is

Page 4: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Where used

• Nuance Dragon Family• Dragon Pro• Dragon Medical• Dragon for Mac• Dragon Anywhere• Etc

Windows Speech Recognition

Google Voice Search

Page 5: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Where used

Personal assistants• Siri• Cortana• Google Now• Facebook M, etc

Car systems

Page 6: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Where used

Smart Home assistants• Amazon Echo• Google Home• Zenbo• Homer, etc.

• Automated Call Сenters SWand more

Page 7: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Where used: ViV AI (unreleased)

Page 8: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Basic Principles• Capture audio

• Separate speech from other types of sounds (esp. noise)

• Compare speech audio with known patterns of text<->audio match

• Analyze language specific model

• Perform actions (type text, execute command) based on collected data

Page 9: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Generic structure of how SR worksMain speech recognition models

(based on Wiki)

• Hidden Markov models• Dynamic time warping (DTW)-based speech

recognition• Neural networks

• Deep Feedforward and Recurrent Neural Networks

Page 10: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas• Engine and Language Modelling (usually on recognition server side)• UI• Hardware• Deployment• Adaptation• Recognition and Text Editing• Language specificetc

Page 11: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Hardware• Mobile HW

• Internal mic (notebooks/tablets)• Noise cancelling mic

• Sound card and drivers compatibility

• System Requirements compliance• HW Dependency• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for

Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)

Page 12: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Hardware

• Mics and recorders (samples from nuance.com store)

• Special bundled HW for Professional*Nuance PowerMic *Philips SpeechMike

Page 13: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Deployment

• Platform• Client OS (Desktop/Mobile)• Server OS for Client app• Server OS for Cloud/Remote app

• Azure Cloud• Amazon Cloud• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)

• Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access)

• Citrix XenApp and XenDesktop/Thin and Thick clients• VMWare Workstation and Horizon• Oracle VirtualBox• Microsoft Remote Desktop/Terminal Services

Page 14: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Adaptation

• Predefined language patterns• Statistical modelsA statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.

• “Part of speech” detection

• Sound specific patterns• Person-specific

• How person pronounce words and sounds• How person construct sentences• Pronunciation speed

Page 15: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Recognition and Commands control• Initial recognition tests

• Turn app into “listening mode”

• Basic commands (“what I can do”)• Extended commands (app-type specific)

• Non strict commands (pseudo-AI)• Search commands• 3rd party Apps specific commands/3rd party SW compatibility

• Dictating into app default text controls (if supported)• Dictating into 3rd party supported and unsupported apps• Transcribing prerecorded audio

Page 16: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Recognition and Text Editing(sample from PCWorld/Nuance)

Page 17: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Languages and Accents• Different accents (UK English, US English, Australian English, etc)• Issues with speaking• Language-specific sounds

• Homophones (French)• Umlauts (German)• etc

• Language specific syntax (using commas, periods, exclamation marks, etc)

• Similar or close pronunciation words (fr. voux, voi, vu, etc)• Hieroglyphs (Chinese, Japan, etc)

Page 18: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Testing areas: Other stuff

• Audio codecs• Traffic consumption (for cloud or remote access apps)• Memory and CPU consumption• Response time and cancelling recognition

Page 19: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Enterprise Recognition (based on Nuance.com info)

Page 20: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Enterprise Recognition (based on Nuance.com info)

• Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com

Page 21: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Page 22: QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

Киев 2016

Links• https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx• http://www.explainthatstuff.com/voicerecognition.html• http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/• http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.ht

m• http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software• https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking• https://en.wikipedia.org/wiki/Speech_recognition• https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp• http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html• http://copia.com.au/medical-speech-recognition/