Psychoacoustics and Digital Music Production

by **Surfwhammy** » Wed Jun 25, 2014 3:12 am

This topic is focused on exploring psychoacoustics toward the goal of understanding how it applies in the digital music production universe . . .

Psychoacoustics is the scientific study of sound perception. More specifically, it is the branch of science studying the psychological and physiological responses associated with sound (including speech and music). It can be further categorized as a branch of psychophysics.

[SOURCE: Psychoacoustics (Wikipedia) ]

THOUGHTS

This is one of the most important aspects of arranging, producing, and mixing; and it is an important aspect of composing and performing . . .

The key bit of information is that in the same way as standing in the middle of a vast field of red roses and thinking you actually see red roses is a stellar visual illusion, so are many of the auditory illusions, some of which are fantastic in every respect . . .

The primary facts with respect to the vast field of red roses are (a) that the human eye has a very limited field of vision for the color red; (b) that the part of the human eye where the optic nerve exits (optic disk) has no visual receptors (rods or cones), hence detects nothing; and (c) that the information is upside-down . . .

The visual perception apparatus of the human mind (a) makes an inference regarding the color of the vast field of red roses based primarily on a combination of previous experiences, current observations, and mathematical probabilities and statistics; (b) fills in the blanks for the parts of the vast field of red roses which are not detected at all; and (c) inverts everything so that it is right-side up rather than upside-down . . .

Explained another way, there are two key aspects of observing; and these apply to all the senses:

(1) What the human body detects or observes physically; translates into neural impulses; and sends to the human brain for perceptual analyzing and rendering . . .

(2) What the perceptual apparatus of the human brain does when it creates the various illusions that map generally to the way we observe or perceive reality . . .

In particular with sound, much of the perceptual apparatus uses logarithms, where one of the common examples is the rule that for a sound to be perceived as being twice as loud, its volume level needs to increase by a factor of 10, hence the unit called the "decibel (dB)" . . .

Another example is that the increments of an octave are divided into 1,200 units called "cents", which is a logarithmic unit . . .

[NOTE: Both of these logarithmic units are ratios (decibels and cents), which is easiest to understand when you consider cents, where the example is that the frequency difference between A1 (55-Hz, the low-pitch "A" string of an electric bass) and A2 (110-Hz, the low-pitch "A" string of an electric guitar) is 55, which is divided into 1,200 cents for the octave; but consider what happens for the octave beginning with A3 (220-Hz, the "A" below "Middle C") and continuing to A4 (440-Hz, the "A" before High C), where there obviously are more frequencies (220) but the same number of cents (1,200), which among other things is one of the reasons it is more difficult to sing bass than it is to sing soprano with respect to the margin of pitch error, where in the soprano region being "off" by a few Hertz ("Hz" or "cycles per second") is barely detectible (if it is detectible at all), but in the bass region being "off" by a few Hertz can map to missing a note by a semitone or even a whole tone . . . ]

However, other aspects of sound are binary, linear, exponential, geometric, and so forth, where for reference there is a bit of overlap in the terms "exponential" and "geometric", but so what . . .

Time is another key aspect, and an example of this in the perceptual apparatus is the Haas Effect, where the succinct version is that two identical sounds arriving within a small time frame are merged by the perceptual apparatus of the human mind into a single sound which (a) originates from the location of the first sound and (b) is louder than either of the two individual sounds, where the time frame for the increase in loudness is from 5 milliseconds to perhaps 30 milliseconds of separation between the two individual sounds, but the merging and localizing effect happens in an even shorter time frame . . .

[NOTE: The Haas Effect is relevant to reverberation and rapid echoes, so it is not just an auditory illusion or auditory phenomenon that affects perceived loudness; and my hypothesis is that the Haas Effect developed as a survival mechanism specifically focused on being able (a) to get early warning of rapidly approaching dangers and (b) to determine the most probable path of the rapidly approaching dangers, where the interval most likely was designed to detect specific types of dangers, with the key bit of information being that it is very important to get an early warning about anything capable of doing the exact same, or at least two very similar, things in just a few milliseconds up to as long an interval as 30 milliseconds, where it also is important to know where it did it the first of the two times, since the location of the second time is discarded and only the location of the first sound is considered relevant in the Haas Effect, which based on additional research strongly suggests that this survival skill was developed specifically to make it possible to avoid roaming groups of Morris Dancers . . . ]

The Doppler Effect is another relevant phenomenon; and although it is physical in basis, it affects perception, hence can be used to create auditory illusions, where an example is the sound of a railroad train or other vehicle in a motion picture as it travels toward and then beyond the point of view, even when there is no actual point of view which includes the railroad train or other vehicle, since regardless of the visual information (or lack thereof), hearing the Doppler Effect applied to a sound creates the auditory illusion that the object making the sound is moving . . .

Pitch is an important perception, and it is a key aspect of psychoacoustics and digital music production . . .

Another auditory illusion experienced by some but perhaps not all listeners is the "Missing Fundamental" auditory illusion . . .

As a general rule, it is not so difficult to manipulate and to measure physical properties, because physical properties do not have beliefs and opinions; but since perceptions are the playground of beliefs and opinions regarding the best ways to create the illusion of reality, this is a bit of a problem . . .

However, the combination of (a) various measuring devices; (b) a calibrated full-range studio monitor system with a flat equal loudness curve at 85 dB SPL; (c) knowledge of auditory illusions and the rules that govern them; and (d) the ability to work with highly complex interdependent variables "by ear" make it possible to introduce a bit of factual science into digital music production, which is fabulous . . .

Fabulous! :ugeek:

P. S. Regarding reverberation, two of my favorite bits of information are (a) that Abbey Road Studios used a high-pass filter on the audio sent to its reverberation units, a technique based on the observation that lower frequencies contribute nothing to reverberation and actually make reverberation "muddy" and (b) that Les Paul advised applying reverberation generally to the master output rather than to each individual instrument, vocal track, or performance, where the logic is that it is less confusing; and while I follow this rule generally, I make an exception for snare drum rimshots, lead guitar solos, and lead vocals, but for the latter two more in terms of using elaborate echoes than reverberation, where this basic rhythm section for a Surf Whammys' song is an example . . .

[NOTE: Among other things, you can see the effect of the snare drum rimshot reverberation in the Phase Analysis meter at the lower-left of the screen, where specifically the primarily high-frequency reverberation of the snare drum rimshots maps to a full-width flash of blue across the top. When a frequency is in-phase for the left and right channels it is displayed on the vertical centerline. Out of phase frequencies are shown as dot spreads. Another interpretation is that in-phase monaural sounds are mapped to the vertical centerline while out-of-phase stereo sounds are spreads, where the width of the spread depends on how out-of-phase or different it is. The style of producing and mixing used for "Billie Jean" (Michael Jackson) typically has the bass and drumkit monaural, which for a stereo mix maps to being in the middle or top-center, noting that this is easier to hear when you listen with a studio monitor system rather than with headphones or ear buds, but you can see it on the Phase Analyzer, which is one of the uses for the MOTU CueMix FX Phase Analysis meter . . . ]

"I Want To Dance With You" (The Surf Whammys) ~ Basic Rhythm Section ~ YouTube music video

The filtering strategy used at Abbey Road Studios for the reverberations units when Beatles songs were recorded, produced, mixed, and mastered can be used as a way to partition instruments and voices, along with panning, ducking, and other types of signal processing, where at least with Pop music (a personal favorite) one of the goals is to be able actually to hear everything, a goal which in some respects initially requires the complete and total destruction and annihilation of raw dynamics, followed by the creation of finely crafted perceived dynamics . . .

Lots of FUN!

by **Surfwhammy** » Sat Jun 28, 2014 7:45 pm

All the various types of auditory illusions are fascinating in one way or another, but a few of them are so prominent that they make excellent examples . . .

THE HAAS EFFECT

The Haas Effect certainly is a prominent auditory illusion, and it is used frequently in popular music, as well as in broadcasting where in some respects it is easiest to recognize . . .

As a bit of background, there are federal rules which govern the various broadcasting standards and practices that broadcasters are required to follow; and this is part of licensing and broadcasting regulations, which in the US is handled by the Federal Communication Commission (FCC) for radio stations, television stations, cable broadcasters, and satellite broadcasters . . .

One of the requirements is focused on ensuring that the various signals being broadcast (primarily audio and video) are within reasonable bounds with respect to being compatible with consumer receivers, which includes not doing anything that might damage consumer receiving equipment; and as it applies to audio, this basically maps to ensuring that the signal strength is within a certain range which has a definite and well-defined upper limit with respect to volume levels . . .

Since for the most part radio and television stations earn money by broadcasting advertisements, there is a strong financial incentive to make advertisements as effective for advertisers as possible, and for audio this maps specifically to making advertisements louder than entertainment content; but doing this in a simple way by increasing the volume when advertisements are played causes a problem with respect to the FCC rule regarding keeping the broadcast signal levels within a specified range, where one way to explain this is by using the 'Spinal Tap" analogy, metaphor, or simile, which refers to Spinal Tap's lead guitarist ("Nige Tufnell") and his unique Marshall amplifiers . . .

Most Marshall amplifiers have volume controls that go to "10", but Nigel has specially modified Marshall amplifiers that have controls which go to "11" . . .

It is completely and totally stupid--which is what makes it funny--because the "special modification" actually is just painting "11" on the faceplate after "10" for each of the controls (bass, treble, volume, and so forth) . . .

Spinal Tap: "These go to 11" ~ YouTube video

This is one way in which the Haas Effect is used; and it works nicely because it does not require increasing the volume, hence does not push the signal strength above the upper limit allowed by the FCC . . .

By creating a copy of the audio and then repeating it within 5 milliseconds to 30 milliseconds--which is a simple type of rapid echo or delay--the perceived loudness is increased without actually increasing the physical volume level . . .

Other psychoacoustic technniques are used in conjunction with the Haas Effect, but overall when you have a nice listening level for the entertainment portion of a radio or television broadcast but then it gets noticeably louder when the advertisements start, this is part of what happens, although broadcasters also lower the volume level during the entertainment portion specifically so that they can raise the volume level during advertisements and on average satisfy FCC broadcast signal strength requirements . . .

Remembering (a) that the Haas Effect influences the perceived location of audio and (b) that the Haas Effect maps to increased perceived loudness when the time between the two identical bits of audio is from 5 milliseconds to 30 milliseconds, in its more extreme implementations this maps to a distinct sound for voice announcing and singing, which for radio is most easily recognized in advertisements for automobile racing events, wrestling events, and so forth; but there also is a bit of phasing, which gives it a metallic sound . . .

It sounds louder, and it gets your attention, but the broadcast signal strength and other parameters are within bounds . . .

This actually involves a set of techniques, so it usually is not just the Haas Effect; and one of the additional techniques is something called "ducking", which is a signal processing technique where the volume of one thing (usually a voice-over or singing) causes other things like instruments and background music to be lowered in volume or "ducked", but only during the time when the announcer, voice-over artist, or singing actually is doing something; and since "ducking" is done automatically according to various parameters, it can be fine-tuned to be very transparent, where in this context "transparent" maps to not being very obvious . . .

"DUCKING"

"Ducking" is a psychoacoustic technique; and it creates the auditory illusion that the announcing, voice-over, or singing is louder than whatever is "ducked" . . .

It is physically louder, but the auditory illusion aspect is that when done correctly the audio being "ducked" is not perceived as being quieter or less loud, which is an auditory illusion . . .

Generally, when "ducking" is graceful and smooth, it requires a bit of training to hear it, but once you understand how "ducking" is done, it is not so difficult to identify when it is happening . . .

I use Pro-C (FabFilter Software Instruments) for "ducking", and it is superb. This video explains how to do "ducking":

[NOTE: This YouTube video demonstrates several techniques, but it is important to watch the entire video, since it makes it easier to understand the section on "ducking" that is toward the end of the video. When Pro-C is used in conjunction with Pro-Q (which is FabFilter Software Instruments' advanced equalizer plug-in), there are more techniques that can be used to create auditory illusions . . . ]

Pro-C (FabFilter Software Instruments) ~ Expert Mode ~ YouTube video

THOUGHTS

The Hass Effect and "Ducking" are used frequently in popular music, but they tend to be dramatic; so applying them to Classical, Symphonic, Choral, and other more traditional musical genres typically needs to be done in a more subtle way, if at all . . .

One use for "ducking" is to make an instrument solo or vocalist appear to be louder and more prominent; and in this scenario, everything usually will be "ducked", although it depends on the desired goal . . .

In this context, I think it is accurate to suggest that articulations, dynamics, and some playing styles in music notation is the historical way that certain types of psychoacoustic effects were created at a time when there were no signal processors, which is a useful way to put music notation marks and modern signal processors into perspective, where "marks" refers to articulations, dynamics, playing styles, and so forth . . .

For example, if all the instruments except the solo violin are specified with pianissimo, while the solo violin is specified to play forte, then for all practical purposes this is what simple "ducking" does, except that I think doing it via "ducking" is easier, since it does not required using dynamic marks on every staff, which is one of the reasons that I nearly never use any music notation marks (articulations, dynamics, playing styles, and so forth), because here in the sound isolation studio it is easier to do most of those things with signal processors (a.k.a., "effects plug-ins"), although for playing styles I prefer to use samples where the musician actually is playing the instrument in the particular playing style, unless it is fixed rate tremolo or vibrato, in which case it usually is easier to do this type of tremolo and vibrato with an effects plug-in . . .

There is a virtual festival of ways to control everything, but overall I consider this to be an arranging, producing, and audio engineering activity rather than a composing activity when everything is done in the digital music production universe, because even though the sampled sounds were played on real instruments by skilled musicians, once the NOTION 4 audio is generated and recorded as soundbites in the Digital Audio Workstation (DAW) application, it is no different from recorded tracks of real instruments and singing . . .

If the goal is to produce sheet music that will be played by a real orchestra, then the music notation need to have the various marks (articulations, dynamics, playing styles, and so forth), which also is the case when the goal for recorded audio is to have it sound as if it were a simple stereo recording of a real orchestra performing in a concert hall . . .

Nevertheless, if the goal is create audio that is played through a loudspeaker system or headphones, then everything is virtual no matter how it is done, and this makes creating auditory illusions a key aspect, which is all the more significant when the instruments are virtual, which is fabulous . . .

Fabulous!

P. S. I will do a bit of grammar checking and editing later . . . :ugeek:

Psychoacoustics and Digital Music Production

Psychoacoustics and Digital Music Production

Re: Psychoacoustics and Digital Music Production

Who is online