I recently listened to a broadcast of one of my station’s music interviews during a long drive. The dialogue sounded clear and robust, but once the music came in it sounded like someone had pulled a fader down by 6dB. I knew there must have been a misconception about level somewhere downstream in our workflow.
A lot of our work as audio engineers revolves around monitoring the audio levels of your material, and the most common method of doing so involves using meters of various types. Levels and metering can actually be very complicated. After all, “level” can be interpreted as dBFS, dBu, dBv, dBV, dBSPL, etc. And metering… well, what kind of meter? A peak-reading or a VU meter? The matter continues to further complicate itself as you continue to peel back layers of the onion.
Consider the following waveform. It’s a typical music interview recorded to 2-track:
You’ve probably guessed that the chunks with higher amplitude represent music, and the other parts with a lower average amplitude represent speech. Gold star – you’re correct. Your editor might see this waveform and draw an astute conclusion such as, “Holy crap! The music is really loud and the dialogue is really soft! We’ll need to balance out the sections so that the speech isn’t buried and the music doesn’t blow out their speakers.”
So your editor loads the file up in ProTools, and sets about drawing in endless squiggly lines of automation until the channel meter remains practically pegged in place. But here’s the problem. ProTools meters often look like this:
Unsegmented moving bars of green, yellow and red indicating peak level with a yellow peak-hold on top. While certain configurations do provide a numbered scale on the side, that information is useless unless the user knows what peak levels are, and how they relate to how humans hear.
Peak metering provides a nearly instantaneous indicator of the maximum level created by your waveform. Referring to the waveform above, we can see that the music portions generally peak around -3dBFS, while the dialogue peaks around -18dBFS. Let’s take a closer look:
We’re now zoomed in on one of the songs. What previously looked like a massive block of -3dB audio now looks a lot less threatening. In fact, we can observe that the majority of the signal resides around -12dB or less, while peaks occasionally hit -3 or -1. Let’s look at one of those transients:
This transient occurs during a cymbal crash and kick drum strike. It’s only 37 milliseconds in length. For comparison, here’s a zoom of a dialogue section. Notice the relative consistency in amplitude, which is just a product of the characteristics of human speech as captured.
Human hearing is non-linear with regards to frequency content and amplitude. Our perception of loudness is much more closely related to RMS level than it is to peak level. So what’s my point in writing all of this? Here it is:
In program material, un-mastered music will and should peak higher than dialogue portions. During dialogue, you have only the human voice to worry about. In music, you often have kick drums, snares, percussion samples, and a host of other sources who’s attack and decay characteristics create large, but brief transients. In order to create equal perceived loudness, most music must peak higher than speech. Let’s prove it by examining a song by Cymbals Eat Guitars.
The stats for a selection of the music portion show us a peak amplitude of 0dBFS (right channel). If I’m looking at a peak-reading meter in ProTools, I will see instantaneous peaks in the red. Note that average RMS power for this selection is -18.52dB. That’s a difference of 18.52dB between our peak level and our average RMS level.
The stats for a selection of the dialogue show us a peak amplitude of -18.88dBFS. In order to make a fair comparison, a short section with continuous speech was selected. (Silence should not be included in the measurement). On the same meters mentioned above, I will see meters in the green, most likely below the halfway point. The average RMS power for this selection is -31.25dB, and peak-to-RMS average is 12.37dB.
Without understanding perceived loudness, and relying only upon ProTools peak metering, I’d reduce the level of the music by 18.88dB. That would equalize our peak levels between the two sections. But in doing so, I’d also be bringing the average RMS level of the music down by 18.88 to -37.4dB which is 6.15 decibels lower than the interview portion. By matching the peak levels of our music to our dialogue, we’ve created speech which is twice as loud to listeners as our music is.
Long story short, know what ProTools meters are telling you, understand how peak meters relate to human hearing, or use your ears. Or go get a VU meter!