43
So, what is ? Microdynamics? And some free file-based audio metrics. James. D. (jj) Johnston Bell Labs Audio Researcher (and other stuff)

So, what is ?Microdynamics? And some free file-based audio metrics. James. D. (jj) Johnston Bell Labs Audio Researcher (and other stuff)

Embed Size (px)

Citation preview

So, what is ?Microdynamics?And some free file-based audio metrics.

James. D. (jj) JohnstonBell Labs Audio Researcher

(and other stuff)

?Microdynamics?

• Well, it’s not dynamic range• It’s not RMS for sure• Is it variation in RMS?– How do you decide what time interval to use?– How do you relate that to hearing?

• Is it variation in loudness?– This relates to hearing– It provides a time interval to consider

A bit of Review first

• Loudness is SENSATION LEVEL, it’s how loud you feel something is.– It can be reasonable well modelled for most uses– For time domain issues, it’s a bit more tricky

• RMS is an analytic measurement of what will become power when played in the real world.– It is trivial to calculate– What does it mean in this context? Not much.

From the last talk:

• A simple loudness model– It could be adapted to have proper time stride in

regard to frequency– It would be a lot more complicated and about 32

times as slow– It would be more accurate

• For now, let’s use the one we have, and give it a try.

The difference between “dynamic range” and variation in loudness.

• It is possible to make many signals with exactly the same dynamic range– One can smoothly increase from the smallest to

largest value– One can maximize the inter-block values, but keep

exactly the same histogram and mean loudness– A hypothetical example

• We will propose the following, 10 block system, with a uniform distribution, and loudness of blocks being 1, 2, 3, … 8, 9, 10.

• There can be many orders that create that histogram:– One order is 1, 2, 3, 4, 5, …– One could have as well 10 9 8 7 …– In fact there are 3628800 such orders one could

observe, although many of them would be enormously unlikely in a real audio signal

For the sequence 1, 2, 3 …

Loudness from 1 to 10 in order

Difference, block to block

What do we see there?

• Our sequence of loudnesses, 1 to 10, in order• Difference at each step is -1.• RMS difference is 1.• Not a big difference

And now for a different sequence1 10 2 9 3 8 4 7 5 6

And here?

• There’s rather a lot more difference.• In fact, the rms difference here is 5.67

• The point? These two sequences have exctly the same mean and the same histogram. From the histogram and mean, you can not determine any “micro” kinds of characteristics

How to maximize the rms difference?

• Well, there are 3628800 sequences

• I leave it to the reader to figure out which one (or ones, the time reverse will have the same RMS value) has the highest RMS.

• Good luck!

My point?

• You need to look at the time series of something or other in order to get any sense of more than overall dynamic range

• Hopefully, it’s obvious (given last year’s talk) why loudness is more useful than RMS values.

Ok, that was all hypothetical “block to block” differences.

• So, how long should the block be?– Well, this is hearing, so how long would make sense

for the auditory system?– That is “interesting”.

• At low frequencies, 17 milliseconds makes sense (that’s about 735 samples at redbook rates), but don’t forget our window. That would suggest 1024 samples is a useful number.

• Again, recalling the ERB structure of the ear, which is about ¼ octave at high frequencies, at 16kHz we’re talking about .33 milliseconds.

• Now, that’s not so helpful, is it?

• Could one make a loudness model that accommodates all of that?– Yes. But we’re not going to do that today. That would be a good

subject for a 3 hour tutorial talk!– Here, let’s try the 1024 sample window, which has the necessary

frequency resolution to have a chance of working at low frequencies. – Most of the energy (and loudness) in most signals is in low

frequencies.– BUT percussion, which is very dynamic, has a broad spectrum.

The Loudness Model

• This is the same loudness model used in last year’s talk. No real changes.– It works on 1024 sample blocks, shifting by 512

samples per measurement. Yes, overlap is necessary.

– This model is in the matlab file lplt_t.m, also on the PNW Section Web site.

– You’re welcome to have fun with it.– “How to get Octave” is a question best asked of

your nearest linux guru.

What come out of that program?

• There are 4 plots, each a measurement of some characteristic of the signal

• There is a string of numbers below that.• The interpretation of each plot will follow.– There is a lot of information in that one little plot.– This is where the rubber meets the road, or

doesn’t.

A word about these loudness numbers

• They are arbitrary units, in the range of 0 to 400. – The listener has a volume control– His system has a sensitivity (and one that may vary

with frequency)– So, we stick with arbitrary units of loudness, let’s

call them ALU, for Arbitrary Loudness Units

The plot:

Top Plot: Histogram of Loudness

• This is a histogram, NOT a time-domain plot.– The vertical axis goes from 0 to 1 (more about

those negative values in a minute)– The horizontal axis goes from 0 to 400, units ALU– The top of each bar on the vertical axis shows the

fraction of blocks in the clip being analyzed with loudness in its bin, the center value being the value on the horizontal axis.

Those negative values?

• (there’s no such thing as negative loudness, zero means you can’t hear it)

• They are marking three points on the loudness scale, from left to right:– The value where 5% of the blocks measure smaller than that value.– The mean value– The value where 95% of the blocks measure smaller than that

value. The mean value is also shown numerically in the text at the bottom of the plot.

• The ratio of 5% value to 95% value makes a decent estimate of dynamic range. To convert that to dB, raise that ratio to the 3.5 power and then convert to dB: (10 log10 (ratio^3.5)).

2nd from the top

• This one is simpler, it is the plot of loudness as a function of time.– Loudness is the vertical axis, again 0-400 ALU– Horizontal axis is the block number, where each

block shift is 512 samples.

– This shows how much loudness varies, in some sense, in a file.

The 3rd Plot

• This is block to block normalized loudness difference, in ALU, of course.

• It shows attacks, decays, etc.

• This seems, to my ear, to maybe relate to “microdynamics”

• The mean absolute value is shown in the bottom label. These numbers seem small, but that’s because they are normalized. It is not a loudness difference, it is a relative loudness difference.

Should that be RMS?• I don’t know.

• If you want a hardcore psychoacoustics research project, give it a go!

• It does seem to scale with my personal sense of “dynamic signals”. It varies from about .04 to about .12 for most signals, and .04 sounds squished to toothpaste, while .12 sounds almost excessively dynamic.

• Your mileage may vary. It should, perhaps, when talking about preference.

The Last Plot

• That’s a histogram of the actual PCM levels in the signal over the whole clip.

• This has very little to do with the psychoacoustic realm, but it does point out clipping in a jiffy. I’ll show you some plots that are and are not clipped momentarily.

The text at the bottom:

• It shows in order:– File name of the file analyzed– Mean Loudness– Mean block to block change (absolute value of

change)– RMS level of the file– Peak level of the file– Peak to RMS level in units of amplitude (not dB)

convert to dB by 20 log10(Peak to RMS)– Number of blocks analyzed– Length of analysis block (twice the shift length)

Ok, on to some plots. This is the same one as used in the example above:

What do we see there?

• The first thing is that the loudness histogram is fairly wide, indicating dynamic range. This is confirmed by the difference in the 5% and 95% values.

• The loudness histogram also has a “bulge” at lower loudness, and extends quite a ways to toward higher loudness, showing that the range is mostly “upwards”, i.e. peaks, not periods of quiet sound.

• The loudness vs. time plot shows the same thing in another way, which helps to make clear the reasons for the shape of the loudness distribution in the first plot.

The difference plot

• This shows that the differences are mostly large upward, followed by slower downward changes. This is typical of a physical process, and also of most audio signals.

• The upward peaks are close to 1, meaning that they are rapid increases in loudness.

• The string at the bottom gives a block to block change of .08. Without some observation of other signals, this is not yet clearly meaningful. (remember .08 is not necessarily a small average absolute value, we have no scaling here)

The Level Histogram

• This shows that there is, effectively, no clipping, and that the center half of the PCM levels are heavily used.

• Zero (or close) is expected to be the most common level, and it is.

• Outside of the very center, the roll off in frequency of PCM bin is more or less a straight line, making it close to log-normal.

• This is “as expected” from the basic mathematics.

And, a different signal

Yes, a horse of another color!

The Loudness Plot

• It’s very, very loud.• The peak in the distribution is quite narrow, meaning level

does not change a lot.• There is a secondary near-peak at lower loudness.• This is shown to be “the quiet part” when looking at the

time-domain plot, not some kind of unusual local (in time) dynamics.

• It has a much lower dynamic range. If we ignore the “quiet part” it has very little dynamic range at all.

• Its mean loudness is 235. Compare that to the previous clip, which is at 87. Yes, it’s loud. It’s supposed to be, of course.

The difference plot

• Here there are bursts of local dynamics, interspersed with sections of very little variation in the local dynamics.

• The mean average block to block change is .06. This is not that much smaller than the .08 above, which suggests that small changes may be important.

The level histogram

• This is a poster boy for clipped.

• Do I need to explain? If I do, shout out, and I’ll explain.

• Those “train tracks” at the sides are classic evidence of clipping.

• You may be surprised to find that kind of clipping not always at max and min. Don’t ask me!

And a smooth vocal track

What else to say?

• It’s very, very smooth. No peaks, no dips• Every attack is the percussion machine behind

the group• Not loud at all.• Very nice use of the CD’s full range with no

clipping.• Interblock mean absolute difference of .05.– Without the cymbal it would be even smaller.

And then there’s this:

Yeah….

• It’s loud.• It has very little internal dynamics• It’s loud.• Did I mention it’s loud?• Note it doesn’t show extensive clipping like

some of the other tracks, but here is a most unusual peak near negative maximum.

• Oh, and it’s loud. And squished flat.

And, One more for the road

What here?

• It’s relatively quiet, but has some strong peaks.• Its average mean difference is high, at .09• There is something interesting in the level

histogram. To explain:– There are many levels in 1 vertical pixel.– What’s plotted is the most common and least

common in each bar.– Notice the many, many BOTTOMS to the level

histogram? That’s kind of important.

That’s actually quite important

• What the level histogram shows in this case is that there are many “missing codes”, that is to say that many PCM codewords are unused.

• This means that the file was mistreated somewhere.