Understanding the surround sound and binaural sound
By Ancillotti
Usually the soundtracks of films are available in stereo or 5.1 surround, although there are other possibilities. Much of the material I have been using sound is binaural, sounding frighteningly realistic with headphones, but much less impressed when played on speakers. But what's this about binaural and surround, and how free software tools can help you take maximum advantage of these types of sound? This will be a constant learning experience, but want to start with a brief description of the most common technologies, and how is the support for them in the file formats that we have at our disposal: Vorbis, FLAC and WAV.
This column may seem a little misplaced, but it works as a starting point (and as a technical introduction) for various subjects of which I intend to treat in the future. To achieve the surround sound processing with free software tools, it is important to start understanding what it is surround sound, and why it matters to us. Midway pass by binaural sound intriguing, and see their difference from the stereo.
The road to surround sound
The first sound recordings were in mono, or a single waveform was recorded, representing the frequency and volume, but without direction. That was enough for many applications, but the result was a little weak on music recordings.
Human hearing is three dimensional. We distinguish the direction and, to some extent, the distance of a sound source. There are a lot of information in the sounds that reach our ears, and the brain performs a very sophisticated processing of such information.
I think most people realize that the stereo works by making the sound arrives at each ear with different volumes: if it sounds louder in the left ear (or out of the box from left), the sound seems to be coming from that direction. And vice versa. The name of this in English is "panning" (which stands for "panorama"), or just "pan." In this case, we pan to the left box.
The stereo is very popular and works very well. It is very easy to produce stereo sound on a table or a mixer application like Audacity. Just change the relative amplitudes of the waveforms from left and right belonging to the elements that you will save. That way, you can pick up multiple recordings in mono (of individual instruments, for example) and distribute them along the left-right spectrum, but to trained ears, the result might sound dull.
Another approach is the same recording in stereo with two microphones, simulating their ears. Some even put microphones on the head of a doll to simulate the effects that the head is about the sound (yes, this changes the sound). This type of recording is called a binaural (two ears), and the ideal is to hear such recording with headphones.
These are some examples of binaural recordings from the Wikimedia Commons and worth a look:
A water mill of the seventeenth century, in the Weald and Downland Open Air Museum
Skyball bouncing two balls in a closed environment
Sounds of a pool table
Sons of paper and whispers
If you hear these sounds through headphones and close your eyes, I bet you'll be impressed with the amount of detail you will be able to understand. Can you localize sounds better. The pan used in stereo can not do it.
In fact, there are many subtle processes at work here. Some are related to how the sound bypasses or goes through your head to reach the ear. This can generate frequency filtering effects or echoes. Also we catch echoes and reverberations caused by the walls of the environment in which a recording was made.
But the more significant effect (besides the volume, which is modeled by the pan) is the phase change. The sound is very slow, at least compared to light.There is a considerable delay between the arrival of sound to the ear and closest to farthest. This delay causes the waveform has changed in temporal relationship to one another. The brain is very sensitive to this information (which is a marvel of neurobiology and evolution, but we never give value to these things), and we interpret these changes as spatial information.
Unfortunately, only works binaural sound even with headphones. With speakers in place, most of the subtlety is lost, and it is difficult to see any difference from the stereo pan. This happens because the sound does not reach his ears without changes. Instead, he is struck by the walls, the furniture and so on, scrambling sensitive information phase. Instead of hearing the sound recording complex visualization, we are affected by the sound display from our room.
And now?
What is this "5.1"?
One way to solve this is to get more speakers. Several configurations were tested over the years, from three to ten speakers, but most popular is shot with six speakers, called "5.1 Surround".
Therein, we still have boxes left and right, positioned in front (usually one on each side of the screen, in the case of videos), but there are others: a center speaker directly in front (right behind the screen) , and boxes surround left and right behind you. This explains the "5" name "5.1 Surround".
The "1" is a channel for low frequency effect (or LFE) which goes to a sub-woofer, usually mounted in front of you, but ideally, is directly below your seat.This is the speaker that makes the room shake when a loud sound is produced, is very popular in action movies.
Variants
Obviously there are simpler versions: the sound quadraphonic or 4.0 surround, for example, eliminates the center and LFE channels, and was popular for awhile in the late '70s. There are also more complex surround sound systems, which basically add boxes, as is the case with 7.1 (left-frontal, central, front-right, left, right, left rear, right rear, and LFE) and 9.1.
Of binaural for surround
How the brain manages to create an audio experience with 3D surround sound and binaural sound (with headphones), probably requires a computational work to migrate from one type to another, right? In fact, yes. And it is complicated.
Search this sort of thing was a surprise to me because I had no idea that the subject was so broad. Some interesting keywords (and links from Wikipedia) may turn good research sources in English: 3D Audio Effect , Head-Related Transfer Functions , Binaural Recording , Psychoacoustics , Sound Localization, and obviously surround sound .
There was some work being done in this direction for the design of video game "Yo Frankie!" Blender Foundation, played by Barcelona Media , resulting in a technical document (PDF 1.4 MB, English) and a slideshow (6 MB PDF, also in English) about using the library of audio processing CLAM added to Ardourand Blender to create simulations of three dimensional sound effects. One day I'll try this technique, and document here, but not today.
Support file formats in Ogg Vorbis, WAV and FLAC
Basically there are four file formats with whom I work regularly to process sound: MP3, WAV, Ogg Vorbis and FLAC. The MP3 has several problems, including patent restrictions that make it a poor choice for those who want to develop some work, but a good amount of music (even free) is distributed in this format.
Given the popularity of MP3, it's good once I start dealing with it: the MP3 does not support 5.1 surround sound and not to any kind of multichannel sound, except for the stereo. There may be variations contradicting this, but they do not seem to be part of the MP3 standard. Of course you can encode binaural recordings on any stereo format, and I find some binaural recordings are in MP3.
The best format for free audio is lossy compressed Ogg Vorbis. The Vorbis does support multichannel sound (indeed, many channels: one says "unlimited", another 256, but the fact is that capacity is more than enough).
Note that there is a difference between a file containing more than one Ogg Vorbis stream flow and a multi-channel Vorbis audio intertwined! It is best to think of a file with Vorbis audio streams as a separate set of alternative audio tracks (and this is exactly how the VLC handles the flow). Anyway, one with a single stream Ogg Vorbis can store multiple channels, for their simultaneous playback in different boxes. The most common is the stereo, where the first channel goes to the left box and the second goes to the right box. This becomes more complex and less standardized at 5.1, but the principle is the same.
However, the 5.1 surround sound tends to be a necessity in environments of high fidelity, which is not combined with concern about the size of the file formats found in lossy compression.
So most times I try to mix tracks of high fidelity surround sound, work with one of two lossless formats are available: uncompressed WAV and FLACcompressed and lossless. FLAC (abbreviation for "codec free lossless audio) may not sound very familiar to the general public, but became a popular format for sharing audio files without losses on the Internet. FLAC files are usually much larger than the MP3 and Ogg Vorbis files, but much smaller than WAV files, which are enormous.
Channel assignments
As I mentioned briefly, there is widespread agreement about the correct order for audio tracks: the left channel first, followed by the right. But things are not so simple with surround sound. Standardized orders are slow to appear, and there are inconsistencies. FLAC follows the same convention of WAV files, but Vorbis uses a different order. I had to search a bit to figure it out, so I'll close today's column with a reference table for the pattern of allocation of channels for 5.1 surround sound in these formats: