Why you want to record as loud as possible

matthew's picture

People often wonder "why should I record a signal as loudly as possible when it risks clipping above 0dB?" Here's your answer. It all boils down to RESOLUTION, and that's why 24-bit recording -- and soon, IMHO, 32-bit recording and even higher -- are all the rage. A friend who is a professional storyteller is trying to engineer his own CD. He started out with GarageBand. It's an excellent tool for general-purpose dorking around on a Mac, particularly if you're just getting started. But if you really want excellent voice or other audio recording, not just MIDI stuff, you have to step into the realm of built-for-purpose audio tools. Pro Tools and others are great, but at $250 and up they are not in everybody's budget.

To start with, I think Audacity might be right up your alley. Free and I think you can run standard plugins with it. The only real thing you need is to monitor for clipping. Talk at your maximum volume into the mic, then adjust your gain on the mic until the clipping monitor goes off. Clear the red clipping light in Audacity, and drop the gain just a little bit. Keep at it until at your maximum speaking voice volume doesn't trip the clipping light on the track monitor.

You're a bit of a geek, so I'll explain the science behind why you want to do this. On an analog tape, you want to use as much of the tape as possible with your voice signal. Any bit of the tape that isn't used by the waveform is background hiss.

Similarly, you only have 16 or maybe 24 bits of amplitude resolution in the digital realm. The waveform's sampling rate is the resolution of the horizontal axis of the waveform (with total length of your piece as the total length of that X axis), while the bits are the vertical Y-axis. You want to use as many bits as possible to store amplitude data. When your waveform is too "quiet" (not as tall as it could be), you're sacrificing thousands or millions of bits to storing silence above the top and below the bottom of the waveform. And if you apply a compressor or hard limiter, you're "stretching" the waveform vertically, and your resolution doesn't actually change. Just the volume.

So that means that you lose volume resolution, which with spoken works mainly affects the attack and decay of your voice. It's part of why CDs sound so "cold" to many people (including me): they lose the high end due to sampling error (44.1KHz sampling rate), and they lose volume distinction (attack and decay are where the human ear can notice it) because there are only 65,536 distinct levels of volume in the waveform. 24-bit gives you over 16 million distinct levels. So if you record a waveform like the sample you showed me earlier that uses perhaps 1/20th of the dynamic range available, you still have as much dynamic range as you would recording in 16-bit in the first place.

Which most people will never notice.

But why put yourself in that situation when it's so easy to do it right in the first place?

Ultimately, "how it sounds" is most important. And for many types of music where the attack and decay of sounds are not a huge deal, the loss of amplitude resolution recording the track "quietly" or with a low level of resolution isn't that big a deal. 16 bits is OK for loud rock music, for instance.

But in my opinion, the human voice is one that is incredibly hard to reproduce accurately. Humans are attuned to the sounds of voices, just as we are to people's faces. We already fight the battle of telling a story through a loudspeaker. A round, vibrating paper plate in a cabinet does not accurately simulate air passing over vocal folds vibrating chest and nasal cavities. Humans can clearly hear the difference between someone speaking near us and a recording of someone speaking near us. Higher sample rates and bit rates take us only so far; a vibrating plate is what captures the sound of the voice anyway. The loss of attack and decay resolution by recording a track "too quietly" just compounds this issue.