Acoustics and Listening Room Treatments - Part I

There have been many unexpected results from the COVID lockdowns, but one that would qualify as a silver lining, at least from my perspective, is that millions and millions of people who ordinarily would have had meetings face to face were forced to meet using Zoom (or other) technologies and learned first-hand about the effects of small room acoustics. Noisy, echoey, and difficult to understand speech about describes it. This is because when someone speaks in a small room, the sound doesn’t just go to the microphone…it also moves toward the walls, floor, ceiling, tabletop, etc., and then bounces off and onward to the next hard surface; each delayed in time just a bit due to the extra distance traveled. This continues until the sound energy has dissipated enough to fade away. Like a stone thrown into a pond, but in three dimensions. It’s audio carnage.

Strangely, if a person were to take the place of the microphone in this same room and listen to the same person speak, they would not perceive it at all them same way. This is because our brain has the ability to sort out all these sound waves (called early reflections) bouncing off surfaces and competing with the original sound wave. In fact, our brain is HIGHLY attuned to these delays.. so much so that this is how we are able to determine the direction and distance from which a sound is coming. This is a function of our ear structure as well as having one ear on each side or our head. The little computer microphone doesn’t have these advantages*, so listeners at the other end just get a mash-up of all the sound bouncing around the room with different timings and loudness levels.

So why do some Zoom meetings sound worse than others? Well, there could be several contributing factors, but the primary culprit is frequently room size. The smaller the room, the closer the reflective surfaces for sound to careen off of. Sound energy falls off predictably (with some atmospheric variables) over time, because the speed of sound is a constant (again with some atmospheric variables), and decays to roughly half its energy for every doubling of distance. So a larger room, while subject to the same early reflections as the small room, will be much less affected by them since they will have lost much of their energy by the time they travel to a distant wall and bounce back.

Another consideration would be the delayed arrival of the reflected sound. Early reflections arriving at our ear delayed by less than 30 milliseconds (ms) or so are not perceived as a discreet echo, but rather as a tonal shift or even a “thickening” of the original.. Back in my recording engineer days I used this intentionally as a technique applied to main vocals or solo instruments to make them “pop” in a mix. My go-to move was to create two copies of the main track in ProTools (a digital audio workstation), delay one 8ms and one 15ms, pan them hard left/right, and apply a slight pitch shift up/down to each. But that’s another story… back to early reflections. Delays greater than 30-35ms are heard as a second iteration of the original, or an echo. Most small rooms will have at a minimum 6 parallel surface (wall, ceiling, and floor) all close enough to each other so the spoken word bounces back and forth between these parallel planes, and since the distance is short, they contain enough energy to make the trip many times before they dissipate into irrelevance. For each set of parallel planes. No wonder it’s a mess.

To further complicate matters, there are first, second, third, and to some degree forth order reflections to contend with. This refers to the number of surfaces encountered from source to microphone or ear. So far we’ve mostly been talking about the first order variety where angle of incidence equals angle of reflection (now, was that high school trig or middle school science?), but all the other angles come into play as well. Imagine if the room were covered with mirrors. How many ways could shine light on the computer microphone with a laser pointer? Probably more than four, but sound waves dissipate faster than laser light.

So that’s the conundrum. In part II we’ll take a looks at some solutions.

*On an interesting side-note, there have been many advocates in the recording community of what is generally referred to a ‘binaural recording’. This frequently takes shape as a human head constructed from a non-resonant material with two microphone places where human ears would be located.