Auditory illusions

June 19, 2011 • 7:01 am

by Matthew Cobb

We all know about visual illusions, and the way that psychologists and neuroscientists use them to develop and test hypotheses about how the visual system works. Other kinds of sensory illusions are a bit trickier, however. ‘Phantom limb’ syndrome where amputees imagine they still have their limb (and, sadly, often feel pain in it) is an example of a touch illusion. And the ‘burning’ sensation you get from eating chili or curry is due to the fact that the pain receptors on your tongue (or your mucus membranes – ouch!) are activated by the capsaicins in the spice. It doesn’t ‘really’ hurt. Or rather, it hurts, but there is no damage being done.

This website contains 10 of the most fascinating auditory illusions. [NB – the sound files are linked to the small grey box *below* each text box describing the illusion.] Some of them are illusions that are like visual illusions. Such as number 7 (‘Falling Bells’): ‘This is a recording of a paradox where bells sound as if they are falling through space. As they fall their pitch seems to be getting lower, but in fact the pitch gets higher. If you loop this sample you will clearly see the pitch jump back down when the sample repeats. This reveals that the start pitch is obviously much lower than the finishing pitch.’

But perhaps the most striking examples are only an illusion in the way perspective is a visual illusion – you can perceive something in 3-D that’s only really in 2-D. Hear (!) you perceive something in space, when it’s just in your head. These are examples of ‘dummy head’ recording. Number 5 – ‘Virtual Barbershop’ features a session at the barber’s and is stupendous when listened to on headphones. The final seconds are great! Number 4 just features a box of matches.

If you want to know more about the science behind this, go to the web-page of Diana Deutsch, Professor of Psychology at San Diego, who developed many of these sound files.

h/t : Fellow fly-man Walton Jones.

41 thoughts on “Auditory illusions

  1. Okay…that was spooky.

    My own phone happens to be a ’50s-era rotary, and it’s in the exact same relative position as the one in the barbershop recording. I honestly though it was my phone ringing until Luigi started talking about it, and even then it took a moment to be convinced….


  2. Besides the “virtual head” illusion (technically called a “head-related transfer function or HRTF recording), there are a number of less striking but, in my opinion, much cooler illusions involving two ears that don’t have direct visual analogs.

    For example, by playing two pure tones, one in each ear, at slightly different frequencies you will hear a beat where none really exists (called a binaural beat). Similarly you can create an illusion of a sound particular pitch that doesn’t really exist (called a binaural pitch). So by using the two ears you can create the illusion of sounds that don’t actually exist.

    There are actually a large number of illusion related to creating pitches of various sorts, which is the reason why the auditory community so far has no been able to come up with a universal and agreed-upon definition of “pitch”. Everyone knows a pitch when they hear one, but figuring out a definition of pitch than includes all of these illusions is extremely difficult.

    1. For example, by playing two pure tones, one in each ear, at slightly different frequencies you will hear a beat where none really exists (called a binaural beat). Similarly you can create an illusion of a sound particular pitch that doesn’t really exist (called a binaural pitch).

      Those beats are the difference in frequency between the two pitches. For example, if the one is 1,000 Hz and the other is 1,001 Hz, you will hear a once-per-second pulsation.

      When the frequency of the difference gets into the range of hearing, you hear the difference as its own tone. For example, a 1,000 Hz tone coupled with a 1,500 Hz tone will create the illusion of a 500 Hz tone. And, if you know anything about the relation of frequencies to musical intervals, you know that the two “real” tones are a perfect fifth apart, while the “resultant” tone is an octave below the lower of the two real tones.

      If you know what you’re listening for, it’s actually very easy to use resultant tones to tune a pair of pitches. In the example above, if the upper of the two real pitches is instead at 1,501 Hz, the resultant tone will be 501 Hz, which will create a secondary 1 Hz resultant tone with the 1,000 Hz tone — again creating a once-per-second pulsation.

      With skill, you can not only hold pitches in tune relative to each other, but you can even adjust the phase, giving you control over the volume of the resultant tones. That’s how the twelve-person brass section of the Chicago Symphony can play louder than a hundred-person high school marching band.



      1. That’s how the twelve-person brass section of the Chicago Symphony can play louder than a hundred-person high school marching band.

        Even when the CS brass section is marching?

        1. Ha!

          Pick any top-tier brass section, and you’ll find guys who have a hard enough time just walking out on stage at all. How they manage to compensate when they’re playing is a mystery to all.

          Arnold Jacobs, the legendary tubist of the Chicago Symphony, famously only had one lung and adult-onset asthma. Phil Myers, still the principal hornist of the New York Philharmonic, tops the scales at hundreds of pounds and bounces up and down a hundred pounds at a time. Adolph “Bud” Herseth, perhaps the greatest orchestral trumpeter of all time, was well into his 80s before he finally retired from the Chicago Symphony.

          I could go on….



      2. With skill, you can not only hold pitches in tune relative to each other, but you can even adjust the phase, giving you control over the volume of the resultant tones. That’s how the twelve-person brass section of the Chicago Symphony can play louder than a hundred-person high school marching band.

        The acoustics of the environments they play in and the distances involved shouldn’t be neglected. When you have a large number of angled reflectors bouncing the sounds towards the audience it greatly increases the sound pressure level in the audience section.

        In fact, I would appreciate some citations on the phase effects. It doesn’t seem to make sense to me, although I could be mistaken in my reasoning.

        Due to the frequencies and distances involved, a phase that lines up for someone at one location in the audience would likely not line up for someone at another location. Further, with all the reflections at difference distances, the phases at a single location would probably be a mixture of those that line up and those that don’t, and different people will be getting different collections of reflections at different distances.

        1. I think what Ben’s getting at is that very good players, who can play precisely in tune, can create the illusion of being louder than an out of tune ensemble. When a frequency is exactly matched by two players, it “pops”, or seems louder than when two players who are a few cents off from each other. The two slightly different frequencies interfere w each other and don’t have the same aural impact.

          1. Ben specifically talked about controlling the phase of the sound to increase the volume, not the frequency.

            1. I’m afraid I can’t offer you any citations, only personal experience.

              The principle was first explained to me decades ago in a master class by Charlie Schlueter, the now-recently-retired then-principal trumpeter of the Boston Symphony Orchestra. He then demonstrated it with me and the other students (it was a small class): we’d play a short duet passage together twice. Though each of the students played the same both times, Charlie was able to radically control the way the student sounded.

              I’ve since not only personally experienced that sort of thing with other top-tier orchestral performers (such as Ralph Sauer, the recently-retired principal trombonist of the Los Angeles Philharmonic), but in my own playing.

              When I’m playing with good musicians, I myself can fairly easily do such manipulations; in good ensembles, we work together to achieve the desired effect for that passage of that piece of music. One-on-one with not-so-good musicians, I can still do it, but it requires a lot more effort on my part to find just where the heck the other guy is. Add in very many more not-so-good musicians and the best I can hope for is to bolster the guy next to me.

              Certain key positions of the orchestra can have a disproportionate effect. In a high school all-star summer orchestra I played in, Warren Deck, then the principal tubist of the New York Philharmonic, subbed for the regular kid for a rehearsal. He completely transformed the entire sound of the orchestra. Don’t ask me how — I still haven’t figured it out. If I remember right, Glenn Dicterow, concertmaster of the New York Philharmonic, might have sat in on another of our rehearsals. If he did, his influence certainly wasn’t anywhere near as radical.



              1. I don’t doubt the way you play, the pitch you use, or other such factors can play a role. I would just be surprised if phase was the reason. If there are more than two people, then everyone in the orchestra will be hearing different phases. At least that applies to the phase of the fine structure (the component of the sound that carries what you would call the pitch), you could probably control the phase of the envelope (the changes in loudness of the sound over time). But the latter wouldn’t cause the sort of constructive interference you are discussing. I could be missing something, though.

      3. You “beat” me to it (lol)! Trying to eliminate the “beating” between two close frequencies is how most musicians tune their instruments.

        And on organs, you will sometimes find a stop in the pedal division called “Resultant.” When drawn, it sounds a fifth above the normal 16′ pitch, “resulting” in a perceived lower octave (32′ pitch). This is how many organbuilders achieve 32′ pitch w/o having to install 32′ pipes. (There’s no “beating” a real 32′ stop, however. Especially a reed. Oh, I crack me up!)

        The same effect is often employed by inserting fifths into octaves in the left hand of piano music, or in ensemble music.

        Another interesting tidbit is that in organs, the pipes are arranged so that semitones are not next to each other. There is a “C” side, and a “C#” side, each arranged in wholetones. When the semitones are next to each other, the resonator (body of the pipe) affects the one next to it, dragging the pitch of the speaking pipe ever so slightly toward the semitone that is closest.

        1. D’oh! Forgot to add that the reason we perceive that imaginary lower octave is that our brains, especially in lower registers, recognize the actually sounded fifth as the first two partials in the overtone series of that imagined fundamental.

          1. That is correct. The question is why. And no one is entirely sure. It can’t be a mechanical effect, since we can combine tones between the ears to create a perceived harmonic.

  3. Teens annoying the fuck out of you…what with their music, sloppy hair, and nauseating optimism for the future? Get your TeenAway ™repellent NOW! Just plug it in anywhere you have a teen infestation and watch them scurry away like cockroaches! Only $19.95, but WAIT – THERE’S MORE! Order now and receive TeenAway Portable ™ – keep the buggers out of your personal space no matter where you go! Get one for dad!

  4. This post reminded me of the 1812 Overture finale with the bells and stuff. Had to go listen to it again…

  5. As they fall their pitch seems to be getting lower, but in fact the pitch gets higher.

    Huh? Perhaps it’s because I was primed to hear it, but I could only hear the sound going up while circling my head.

    No. 9 loses something in only having the fast version available. Makes it pretty hard to compare …

  6. I was disappointed with “Phantom Words” (#1). I just heard “no way” throughout, without any change. In one ear, “no” was emphasized and in the other ear “way” was emphasized.

    “As you listen to it, you’ll start to pick out specific phrases.” Perhaps I’m not auditorily fantasy prone or something? Which is odd because I am a jazz musician.

    1. Same here. And really I could tell that it wasn’t really “no way” either. At least, the description says it’s nonsense words and it more or less sounded like that, with a vague resemblance to “no way.”

      In fact, I didn’t find many of them impressive. But the one about the sound that you can hear only if you’re younger than 20, THAT was cool. 🙂

  7. Maybe I can clarify the discussion between Ben Goren and TheBlackCat.

    Conventional wisdom that “phase doesn’t matter” is based on some A/B testing. Let w(t) be a periodic waveform (say, a sawtooth), and let w(t) have the same amplitudes of harmonics, but different phases relative to the fundamental. The A/B testing can play w(t) in headphones, followed by w(t), and we expect a listener cannot tell the two apart. Contrarians like me look for exceptions to this rule, and exceptions can be found, but I must admit that the conventional wisdom is the rule because it works most often.

    Ben’s A/B scenarios are different. He plays in a room with three spatial dimensions {x,y,z} and a pressure field p(x,y,z,t) excited by multiple sources. I think Ben’s experience is A/B testing multiple sources being coherent versus incoherent.

    Suppose the 1st trumpet plays a note with a period of 1 ms = 1/1000 second (a little below high C on a C trumpet). So in 1 ms, a pulse of positive acoustic pressure leaves their lips, travels down the tubing at the speed of sound, reflects back from the bell as a negative acoustic pressure, then arrives at the lips to help open the lips for the next pulse. I’ve observed this myself with a small microphone at the bell, or slightly inside the bell (but not outside the bell, where dispersion smears the waveform). So now we have a voltage v1(t) for the 1st trumpet, and we can use the zero crossings of v1(t) as a timing reference.

    First I would characterize the “phase” noise of the zero crossings with Allan Variance (AVAR) directly in the time domain (not in the “frequency” domain — did I mention I’m a contrarian?). I bet the pro tuba player played with lower AVAR, giving the other players a chance to play coherently. Next I would introduce a 2nd trumpet player with v2(t), and let ∆(t) be the time difference between the zero crossings of v2(t) relative to the zero crossings of v1(t). To play coherently, we only need ∆(t) to be some fairly steady value (not necessarily zero), maybe wandering slightly. Then I would calculate the AVAR of ∆(t) as a measure of their coherence, and I bet that better brass sectional players play with lower AVAR of ∆(t).

    Conventional wisdom stands, that in the “frequency” domain, each listener hears “harmonics” with their own personal relative phases, depending on where the listeners are in the room — but that’s not the effect I’m describing. My question is whether listeners can hear when spatially distributed sources are coherent versus incoherent. I bet we can. I saw the Philadelphia Orchestra’s trumpet section play a unison at the end of a symphony maybe 30 years ago that was shocking, as if I could see the sound hovering in front of my eyes, and I felt lifted from my chair. Maybe that was partly high sound pressure level, and partly perception, but I think that was the effect Ben means.

    1. All that may indeed be what Ben had in mind. Perhaps he’ll weigh in. I still wonder if the phenomenon he described isn’t more along the lines of what I suggested. There are a number of simple ways in which the psycho-acoustical phenomenon of “seeming louder” can be achieved:

      As I already mentioned, matching (as close as is humanly possible) frequencies will seem louder than a pair of slightly out-of-tune pitches.

      And, interestingly enough, a dissonant interval (say, a minor second) seems louder than a consonant interval (major third). So the extreme in consonance (perfect unison) seems louder than two very close, but not TOO distant, frequencies, but once you’re in the ballpark of a minor second (that is, a deliberate dissonance – not simply out-of-tuneness), the relationship inverts itself.

      Also, a higher frequency seems louder than a lower frequency, despite their sharing the same amplitude.

      In my (admittedly fairly unscientific) experience, these explanations seem to be pretty much exhaustive when I consider why I perceive something to “pop”, or to have an illusory larger amplitude.

      (assuming, of course, that I can reliably rule out that what I heard was NOT in fact actually louder than whatever I’m comparing it to.)

    2. There is no need to be contrarian. The important role of temporal coherence on the perception of sound, including spatial perception, is well-established. For example, the coherence of the sound is the primary cue for measuring sound source distance (somewhat analogous to how visually a hazy object appears further away than a crisp one).

      But there are other factors that a musician could control that could have a major impact on how the sound is perceived and how well it interacts with other instruments. These include spectral masking, amplitude modulation depth and temporal masking, modulation onset and offset sharpness and other aspects of modulation shape, and frequency smearing amongst others.

    3. I don’t think I’ve got the background necessary to properly address this…and, even if I did, it would be “merespeculation” without controlled measurements.

      But, that writ, what you wrote certainly seems like a plausible explanation.



  8. Do I get any kind of prize for being able to hear the (annoying) 18,000 HZ tone despite being 29?

    Oh, right, I get to be annoyed. Yippee.

    1. I’m 28 and I heard it to. I was thinking “what does he mean, it can’t be that annoying”, and then I hit the “play” button and immediately cringed from the sound. Then again I also have 20/15 20/13 vision.

    2. I hear it just fine at 35, though the volume was low. I imagine many people won’t hear it due solely to the fact that most speakers likely to be hooked up to a computer lack the necessary range. Low end headphones aren’t much better, either.

      1. Although it is not something that only those less than 20 can hear, it is something that it is much less likely those over 20 can hear. High-frequency hearing is usually the first to go, and 18 kHz is near the upper end of human hearing. It would be expected that someone would lose that at a fairly young age on average, especially people who grew up in the walkman or ipod generation.

  9. The Virtual Barbershop illusion must have been created by Starkey Labs (a USA hearing aid manufacturer) to demonstrate (ie, sell) some new kind of hearing aid technology. The word that Luigi “whispered into our ears” is the name of a proprietary algorithm used by Starkey to try and recreate what a normal auditory system can do. All the hearing aid companies have been working on this idea for a handful of years now. It really is amazing what a properly functioning auditory system can do. This virtual auditory experience is a good way to demonstrate that.

    I can just imagine Starkey rolling this illusion out at a big conference somewhere and getting a fantastic “wow” response from the audiologists and hearing aid sales people in the room. The product and company stay in the minds of the professionals and more Starkey products get sold. I used to work as a dispensing audiologist. I was exposed to lots of these sales techniques, but I must have quit before this one was released. It’s a good one. Of course, it’s a little misleading. Their Cetera algorithm is not going to allow a person with hearing impairment to hear quite like that. Not by a long shot.

    1. The barbershop illusion is based on a well-know principle called the head-related transfer function. It is basically a digital filter that modifies the sound in the same way the head would for a sound coming from a certain direction. It has been known for decades, and is used extensively in auditory research to do auditory experiments over headphones. It has even been used in some music albums, but it only works properly when played over headphones, so people listening over speakers didn’t like it.

      They can either be recorded by sticking microphones in human ears (so-called “individualized HRTFs, which are matched to the person’s own head and ears), or use a generic acoustic mannekin and record from its ears (the most common is called KEMAR).

      So although it may have a “wow” factor for general audiences, it is pretty run-of-the-mill stuff for the sort of people who would be attending the sort of conference you are describing. It is well-executed, but not impressive from a technical standpoint.

      1. @TheBlackCat – yes, I’m fully aware that the Barbershop Illusion is not technically complex. While listening to it, I was imagining the voice actors in relation to a KEMAR head in an acoustically treated laboratory. It is, however, a unique and memorable experience to feel your brain being “tricked” into perceiving things in your physical space that aren’t really there. That’s what makes this a good marketing tool. It’s memorable to feel like kind of like a third party observer to your own brain’s sensory integration. I think even for those of us who understand how it happens, it’s still cool to experience it.

        It is very generous of you to assume that participants at hearing aid manufacturer marketing/educational conferences would understand that this illusion is, technically speaking, pretty basic. I’m not sure this is always the case though.

        1. As I said, it is well-executed. As for the audience, I don’t know about hearing aid salespeople, but my impression is that audiologists are well aware of the importance of HRTFs. I could be wrong, though.

Leave a Reply