22
Kanru Hua (華侃如) June 19, 2016 The Past, Present and Future of Singing Voice Modeling

The past, present and future of singing synthesis

Embed Size (px)

Citation preview

Page 1: The past, present and future of singing synthesis

Kanru Hua (華侃如)June 19, 2016

The Past, Present and Futureof Singing Voice Modeling

Page 2: The past, present and future of singing synthesis

Motivation

“You are making too many assumptions, this thing won’t work on realspeech signal.”

— Jont B. Allen

● What’s wrong with current and past researches in this area?

● What’s our next step?

Page 3: The past, present and future of singing synthesis

What’s in a Speech/Singing Synthesizer

Parameter Generator

Vocoder

Text / Music Score

Speech Audio

Generate pitch, duration and spectrum… from input

Generate waveform from parameters Vocoder

Page 4: The past, present and future of singing synthesis

Part 1History of Speech Analysis/Synthesis

(http://clas.mq.edu.au/speech/synthesis/history_synthesis/)

Page 5: The past, present and future of singing synthesis

History of Math & Acoustics

1600 1700 1800 1900 2000

Law of Forces/Motions, Foundation of Calculus

Wave Equation,Complex Number

Fourier/Laplace Transform,Analog Circuits & Electromagnetism

Newton Bernoulli, Euler, d‘Alembert

(http://www2.ling.su.se/staff/hartmut/kemplne.htm)

Gauss, Fourier, Laplace, Riemann, Cauchy, Kirchhoff, Heaviside

Filtering Theory, Digital Systems, Sampling Theory, ...

Page 6: The past, present and future of singing synthesis

History of Math & Acoustics

1600 1700 1800 1900 2000

Law of Forces/Motions, Foundation of Calculus

Wave Equation,Complex Number

Fourier/Laplace Transform,Analog Circuits & Electromagnetism

Filtering Theory, Digital Systems, Sampling Theory, ...

Newton Bernoulli, Euler, d‘Alembert

Gauss, Fourier, Laplace, Riemann, Cauchy, Kirchhoff, Heaviside

(http://www2.ling.su.se/staff/hartmut/kemplne.htm)

= =

Frequency Response

Page 7: The past, present and future of singing synthesis

Source-Filter Model

Vocal TractVocal Folds LipLung

tf f

Signal Generator (Source) Filter 1 Filter 2

Signal Generator Filter 1 Filter 2Filter 0

Page 8: The past, present and future of singing synthesis

20th Century, the Dawn of Speech ProcessingCooley and Tukey (1965): Fast Fourier TransformOppenheim (1969): one of the earliest digital implementation of speech analysis/ synthesis

InputPitch

(source)

Cepstrum(vocal tract filter)

Analysis Synthesis

Spectrum

Output

Page 9: The past, present and future of singing synthesis

Family Tree of Speech A/S AlgorithmsHomomorphic Filtering

(Oppenheim, 1969)STRAIGHT

(Kawahara, 1998)

WORLD1(Morise, 2009)

WORLD2(Morise, 2013)

TANDEM-STRAIGHT(Kawahara & Morise, 2007)

PSOLA(?, 1985)

Phase Vocoder(Flanagan et al, 1966)

Source-FilterModel

Sinusoidal Model(McAulay & Quatieri, 1986)

SMS(Serra, 1989)

Autotune

CELP(Atal & Schroeder,1983)

LSP/LSF(Itakura, 1975)

MGC/MLSA(Imai, et al., 1983)

SinsyMelodyne

& NiaoNiao& tn_fnds

Harmonic+Noise(Stylianou, 1993)

NBVPM(Bonada, 2004)

WBVPM(Bonada, 2008)

Vocaloid Vocaloid 2+RUCE(Rocaloid 4)

Rocaloid 3

Sine+Noise+Transient(Levin & Smith, 1998)

CeVIO

Quasi-Harmonic Model(Pantazis, et al., 2008)

Chiptune

Vocaine(Agiomyrgiannakis, 2015)

Linear Prediction(Atal & Schroeder,1967)

Page 10: The past, present and future of singing synthesis

Part 2What’s Wrong

Page 11: The past, present and future of singing synthesis

Quasi-static AssumptionAlgorithms affected:

● Homomorphic Filtering● PSOLA● Linear Prediction & CELP & MLSA● Sinusoidal Model● Harmonic+Noise Model● SMS & NBVPM● WORLD & STRAIGHT (slightly)

Page 12: The past, present and future of singing synthesis

Mis-represented Aperiodic ComponentPopular belief:1. Speech = periodic signal + aperiodic signal (breathing noise)2. Aperiodic signal is filtered white noise

Aperiodic

Periodic (Friction)

Page 13: The past, present and future of singing synthesis

Mis-represented Aperiodic Component

t

Algorithms affected:● (Quasi-)Harmonic+Noise Model● SMS & Sines+Noise+Transients Model● WORLD & (TANDEM-)STRAIGHT● Algorithms that do not model aperiodic component

○ Phase vocoder, CELP, MLSA, ...

Page 14: The past, present and future of singing synthesis

Over-simplified Source-Filter Model

Tract FilterOscillator Lip Filter

Tract FilterOscillator

Source Filter

Assumption: source filter is independent from pitch

Equivalent assumption:“When my pitch is higher by 12 semitones, my vocal folds still oscillate at the same speed.”

Affected algorithms: all of those listed on page 11

Page 15: The past, present and future of singing synthesis

Part 3Future: How to Fix &

the Low Level Speech Model

Page 16: The past, present and future of singing synthesis

“Neoclassical” Approaches to Speech Modeling

Tract

Source

Lip

t

f

f

InputInverse

Linear Prediction(Atal & Schroeder,1967)

ARX(Wen, et al., 1995)

ARX-LF(Vincent, et al., 2005)

LF Model(Liljencrants, Fant and

Lin, 1985)

OVE Synthesizer(Fant, 1953)

Page 17: The past, present and future of singing synthesis

“Neoclassical” Approaches to Speech ModelingDegottex (2013): similar idea, but in frequency domain

Hua (2016, in progress): more robust under poor recording conditions; less sensitive to processed input.

Page 18: The past, present and future of singing synthesis

The Low Level Speech Model (new version)

Level 0(Signal Level)

Input Signal

Pitch Harmonic Model Noise Model

SpectrumChannel 1 EnergyChannel 2 EnergyChannel 3 Energy

...

Harmonic ModelHarmonic Model

Harmonic Model

Output Signal

Glottal/Source Information(LF Model)

Vocal Tract Filter Lip FilterLevel 1(Acoustic Level)

An acoustically meaningful speech model

Page 19: The past, present and future of singing synthesis

Inverse Analysis of Speech

Original

Glottal Flow(Source Signal)

Page 20: The past, present and future of singing synthesis

Pitch Shifting powered by LLSM

Original

50% Pitch

200% Pitch

Page 21: The past, present and future of singing synthesis

Pitch Shifting powered by LLSM

Original

50% Pitch

200% Pitch

Instants of vocal fold closure were revealed

Page 22: The past, present and future of singing synthesis

Reference● A.V. Oppenheim, “Speech Analysis-Synthesis System Based on Homomorphic Filtering”. JASA

(1969): Vol. 45, No. 2.

● Degottex, Gilles, et al. "Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis." Speech Communication 55.2 (2013): 278-294.

● H. K. Dunn, "The calculation of vowel resonances, and an electrical vocal tract", Journal of the Acoustical Society of America, 1950, vol. 22, p. 740-753.

● Pantazis, Yannis, and Yannis Stylianou. "Improving the modeling of the noise part in the harmonic plus noise model of speech." Acoustics, Speech and Signal Processing (2008). IEEE International Conference on.