Psychoacoustics

Psychoacoustics is the study of subjective human perception of sounds. Effectively, it is the study of psychology of acoustical perception.

Table of contents

1 Background
2 Limits of perception
3 What do we hear?
4 Masking effects
5 Psychoacoustics in software
6 Psychoacoustics and Music

Background

In many applications of acoustics and audio signal processing it is necessary to know what humans actually hear. Sound, which consists of air pressure waves, can be accurately measured with sophisticated equipment. However, understanding how these waves are received and mapped into thoughts in the brain is not trivial. Sound is a continuous analog signal which (assuming infinitely small air molecules) can theoretically contain an infinite amount of information (there being an infinite number of frequencies, each containing both magnitude and phase information.)

Recognizing features important to perception enables scientists and engineers to concentrate on audible features and ignore less important features of the involved system. It is important to note that the question of what humans hear is not only a physiological question of features of the ear but very much also a psychological issue.

Limits of perception

The human ear can usually hear sounds in the range 20 Hz to 22 kHz. With age, the range decreases, especially at the upper limit. Lower frequencies cannot be heard but loud sounds can be felt on the skin.

Frequency resolution of the ear is, in the middle range, about 2 Hz. That is, changes in pitch larger than 2 Hz can be perceived. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a the (low-)frequency difference pitch. This effect is called beating.

The intensity range of audible sounds is enormous. The lower limit of audibility is defined to 0 dB, but the upper limit is not as clearly defined. The upper limit is more a question of the limit where the ear will be physically harmed (see also hearing disability). This limit depends also on the time exposed to the sound. Sometimes, the ear can be exposed to short periods of sounds of 120 dB without harm, but long times of 80 dB sounds will harm the ear.

A more rigorous exploration of the lower limits of audibility determines that the minimum threshold for which a sound can be heard is frequency dependent. By measuring this minimum intensity for testing tones of various frequencies, a frequency dependent Absolute Threshold of Hearing (ATH) curve may be derived. Typically, the ear shows a peak of sensitivity (i.e., its lowest ATH) between 1kHz and 5kHz, though the threshold changes with age, with older ears showing decreased sensitivity above 2kHz.

What do we hear?

The human hearing is basically a spectral analyzer, that is, the ear resolves the spectral content of the pressure wave without respect to the phase of the signal. In practice, though, some phase information can be perceived. Inter-aural (i.e. between ears) phase difference is a notable exception by providing a significant part of the directional sensation of sound.

Masking effects

In some situations an otherwise clearly audible sound can be masked by another sound. For example, conversation at a bus stop can be completely impossible if a loud bus is driving past. This phenomenon is called intensity masking. A loud sound will mask a weaker sound so that the weaker sound is inaudible in the presence of the louder sound.

Two more parameters that determine the masking are the frequency and temporal separation of the sounds. A sound close in frequency to the louder sound is more easily masked than two sounds far apart in frequency. This effect is called frequency masking.

A third parameter which determines the nature of the masking is the tonality of the masker. A tonal (sinuisodal) masker exhibits slightly different frequency-dependent masking properties from a non-tonal (noise-like) masker. Computer models which calculate the masking caused by a sound must therefore identify its individual spectral peaks according to their tonality.

Psychoacoustics in software

The psychoacoustic model provides for high quality lossy signal compression by describing which parts of a given digital audio signal can be removed (or aggressively compressed) safely -- that is, without significant losses in the quality of the sound. It explains, for example, how a sharp clap of the hands might seem painfully loud in a quiet library, but hardly noticeable after a car backfires on a busy, urban street. It might seem as if this would provide little benefit to the overall compression ratio, but psychoacoustic analysis routinely leads to compressed music files that are 10 to 12 times smaller than high quality original masters with very little discernible loss in quality. Such compression is a feature of nearly all modern audio compression formats. Some of these formats include MP3, Ogg Vorbis, Musicam (used in digital radio -- DAB, or DR --in Europe and elsewhere, based on Eureka 147), and the compression used in MiniDisc, to mention a few common audio compression standards.

Psychoacoustics is based heavily on human anatomy, especially the ear's limitations in perceiving sound as outlined previously. To summarize, these limitations are:

Given that the ear will not be at peak perceptive capacity when dealing with these limitations, a compression algorithm can assign those sounds outside the range of human hearing a lower priority; by carefully shifting bits away from the unimportant components and toward the important ones, the algorithm ensures that the sounds the listener hears most clearly are of the highest quality.

Psychoacoustics and Music

Psychoacoustics includes many subjects and produces discoveries which are relevant to music and its composition and performance, and some musicians, such as Benjamin Boretz, consider the results or some of the results of psychoacoustics to be meaningful only in a musical context.

Yet to be done:

Bark scale, Equivalent rectangular bandwidth (ERB), Mel scale and other scales
Loudness, that is, perceived volume, Bel, sone
Fletcher-Munson curves
Perception of non-existent sounds, such as, missing fundamental frequency, and other auditory illusions. Compare to telephone which transmits 500 Hz to 3500 Hz.
3D-sound perception