A remarkable study: decoding how human faces are identified in the primate brain

June 11, 2017 • 11:00 am

I really dislike posting about a paper that I don’t fully understand, but I guess I’ll have to, as this one seems pretty important. The best I can do is summarize it briefly and give a link so that those of you above my pay grade can look at the messy details. The paper, with only two authors, Le Chang and Doris Tsao of Caltech, was published in Cell (reference and free link at bottom), and involved functional magnetic resonance imaging (fMRI) and electrodes that probed individual neurons in the brains of two rhesus macaques (Macaca mulatta).  (It’s amazing how far technology has advanced!)

The monkeys were presented with images originally derived from 200 human faces, with data taken on a set of 50 landmarks involving both shape and features like eye color and skin tone. Chang and Tsao then constructed composite faces based on various combinations of these landmarks. After identifying a small patch of the macaque brain as being involved in face recognition, they then probed the  firing of individual neurons in this region when macaques saw the faces. Using various statistical analyses incorporating the correlation of each neuron’s firing with the measurements used to construct the facial images, they then figured out an algorithm that best translated the firing patterns into the multidimensionally constructed face.

Once they did that, they could “reverse engineer” the firing patterns alone into facial images; that is, they could test their algorithm by using measurements of firing alone and their model to predict the appearance of new faces seen by the macaques. The remarkable thing is that they could do this with amazing accuracy monitoring only 205 neurons!

Below are a set of reverse-engineered faces (“predicted face”) derived from neuronal firing alone and then compared to the actual image the macaques saw. You can see the remarkable fidelity. What this means is that, as far as we know, the investigators cracked the code of how a facial image is translated into patterns of neuronal firing in the brain.

What does this mean? Well, it means we’re a lot closer to understanding how the brain translates images into neuronal firing, though of course it tells us little about how that firing is reprocessed by the brain itself into an image, for that’s a matter of “qualia”, or consciousness. But it does unravel the complicated nexus by which a whole group of cells work together to create facial images, which of course are crucial for primates recognizing individuals.

One site reporting on this, ZME Science, notes two practical implications:

The researchers can translate from neuron activity in a brain to a human face; from brain activity to visual information. In addition to cracking the code of a brain in a living animal, this study also discovered how brains recognize faces. Before, it was believed that each face cell codes one specific face. However, now we know that each cell represents one piece of visual information that combines with all of the others to form a face. Perhaps human brains also have their own code that works in a similar way. In addition to crime applications, the research could help machine learning for recognizing faces, such as photo recognition on Facebook.

I’m not quite sure what the “crime applications” are unless they ask a victim to envision a perpetrator and use the neuronal firing (instead of a police artist) to get an image of that perp. But that seems impractical. The measurements of neuronal firing might be translatable into computer code, though this too is above my pay grade. Right now I’m just amazed that this was done, and how accurately the investigators could reverse engineer firing into images using so few neurons. I’ll let others speculate about the practical applications.

h/t: Steve


Chang, L. and D. Y. Tsao. The Code for Facial Identity in the Primate Brain. Cell 169:1013-1028.e1014.

52 thoughts on “A remarkable study: decoding how human faces are identified in the primate brain

  1. This could be huge in getting images from eye witnesses to and victims of crimes … but only if and when it will be possible to monitor individual neurons’ activity non-invasively. I don’t think too many witnesses would be ok with having needles poked into their brains in order to interrogate the activities of individual neurons involved in facial recognition.

  2. What this means is that, as far as we know, the investigators cracked the code of how a facial image is translated into patterns of neuronal firing in the brain.

    I’m not sure it’s that simple. Faces aren’t just decoded into visual patterns in the human brain, that are connected to a whole range of associations.

    Look at what happens when those associations break down.

    In those suffering Capgras delusion they certainly recognise that someone they are familiar with looks like that person but the emotional associations are cut so they believe they are imposters. And yet they are able to recognise them if they speak to them on the telephone so the emotional associations remain, only desperate from the visual stimuli.

    People with prosopagnosia are also unable to recognise faces while they may have photographic memories for other objects. My own ability to recognise faces is impaired because I rarely look people in the eyes: I flinch when someone looks at me directly.

    Fave recognition is also linked to mirror neuron activity. Babies will mimic facial expressions and pulling tongues. And baby girls pay more attention to faces than boys so there are also sex-determined differences.

    So face recognition isn’t just pattern recognition, it is linked to a whole set of emotional associations that monkeys aren’t making.

    1. I wonder what this means for me, as I’m very good at facial recognition. It may take me a few minutes to remember where I saw someone, but I know I’ve seen them, even of they look a bit different. Often, though it’s the combination of site, sound and an individuals mannerisms that make me recognize them. I can recognize many actors in other shows faster than most.

      Maybe this is why I can’t do math; meurons are busy or maybe they chat too much and is is why I have migraines.

      1. I also am better than most at recognizing faces and body language. Often I am surprised others can’t recognize the person. I’m also good at recognizing voices.

        For what it’s worth I’m also good at math, or was when I was a student.

        1. I’m always joking about why I can’t do math. Sometimes I say it’s because I grew up in a house on a busy highway when lead was still used in gas. 🙂 That’s all true (the highway and the gas).

          1. I think I’m going to steal that excuse for times like when I purposely walk to the pantry to get something while cooking and when I get there can’t remember at all what I went there to get. My wife might ask, “Can I help you find something?” And I’m like a deer caught in the head lights of an oncoming car. “Uhhhh . . . . “

      2. On the other hand, I suffer from mild prosopagnosia (self diagnosed), which is intensely embarrassing, and makes me feel very awkward in groups that I meet regularly (such as car or bridge clubs), where it takes me ages to be sure who everyone is. I’d happily let them stick a few wires into my brain if they could use them to cure this, but I suspect that I just don’t have the right neural connections.

        On the plus side, I don’t get migraines, and am quite good at reading body language and emotions. We still have a lot to learn about how the brain works, but this study is amazing.

        1. I have bad aphasia before a migraine but not the kind that prevents you from speaking (that’s in a different area of the brain), the kind where you can’t find words. I sometimes look at someone and completely blank on their name. Of course, people just think you’re stupid when you do this so I have to explain that I am probably getting a migraine. People talking all st once when I’m in this state are also confusing to me.

    2. That auxiliary processing should be downstream processing, especially since the paper discuss how a key property of the face code is instant coding (as is their own decoding).

      And I would be vary of projecting emotions or absence of them to animals and our close relatives especially. We likely shared our emotional behavior with a remote ancestor since monkeys and we seem to share many or all such behavior.

      1. I’m not suggesting animals are making these associations – quite the ppposite. We might be able to model how apes see faces because that’s a purely visual thing but for humans those emotional connections mean the mapping is also occurring in parts of the brain that aren’t primarily visual. I’m pretty sure my line manager doesn’t really have horns, for instance; that’s just something other parts of my brain are adding when I think about her.

          1. Make that *mostly* agree, since I noted the shared emotional behavior (whatever emotions are).

    3. I think I mostly agree with you, except for that last sentence. Given what we know of evolution generally and what we have and continue to learn about other animals’ cognitive abilities and behaviours I think it probable that monkeys do have complex emotional associations linked to facial recognition not all that different from humans in kind.

      1. Wouldn’t those acassociations be linked by monkeys to monkey faces? Especially parents, siblings, dominant males and females, etc?

        Getting away from faces, I can imagine we could map the image of a teapot from scanning a monkey brain – but could we extrapolate from this the totality of how a human maps that teapot? When we see a teapot we see an object that has uses, practical and social. We associate the visual stimuli with memories and taste. Sure we could show how neurons in a monkey fire and from these recognise that they say a teapot shaped object but a teapot is more than a three dimensional object.

        1. But that’s not the monkey’s fault. If a monkey had experiences of teapots similar to humans, perhaps the relevant emotions and subtexts would play a role much like it does for us.

        2. I think I must have misunderstood that last line a bit. I thought you were speaking in more general terms than just this study. No, I wouldn’t expect the monkeys to respond to human faces in the same ways they respond to faces of their own species. Though I’d bet they can “read” human faces to some degree because of our relatively close common ancestry.

  3. This defies belief, reconstructing a face from firing patterns of neurons? Theoretically possible, of course, but can we actually do it now? Wow! If it turns out to work and to be reproducible (I certainly hope so), this is revolutionary, at least Nobel prize stuff.

    1. I’m thinking maybe you’d need many more neurons if each neuron only represented a single “pixel” of information. Otherwise the resolution would be very poor. Maybe they reconstruct the face based on some facial segment correlated to a neuron. I didn’t read to paper so I’m just guessing.

      1. The neurons don’t represent pixels, they represent some kind of abstract “features”, perhaps as simple as whether there is a diagonal edge in some part of the picture, perhaps as complicated as whether the face is female, or frowning.

        Such features are also the intermediate layers of artificial neural networks. These can be programmed to also “work backwards” and guess a likely image which would have led to those features. And what I believe this paper has done is to use exactly such a capability, but starting from data from monkey brains.

        A cute example of reconstruction:


        1. Thanks for the explanation, it’s still difficult to see how so few bits of information can build a face that actually belongs to someone.

          It seems so Improbable. 😎

  4. Photo recognition on Facebook, and then at schools and subways and airports and government buildings and businesses and households and soon 2+2=5. Noooooo.

    1. And your facial recognition fork says “Mark, you are supposed to eat more vegetables”.

  5. I had not noticed the reverse engineering showing the coding mechanics, so thanks a lot for this! Makes you winder what our monkey descended brain use the other 1ö^15 – 205 neurons for.

    As for crime applications, I assume the code sparsity, linearity and absence of “detectors for identities of specific individuals in the face patch system” makes it ideal for objective and computationally cheap face record and retrieval.

    though of course it tells us little about how that firing is reprocessed by the brain itself into an image, for that’s a matter of “qualia”, or consciousness.

    I am sorry, but at least for me that read as gobbledygook. The result shows that a monkey brain face decoding likely results in an facial “axis model” where various cells are “encoding specific axes”. Why would we expect a brain within the brain that reprocesses that model into an image? That regression does not seem to lead to a biologically motivated model.

    If such an expectation comes about because a memory elicit something we interpret as an image, it happens while we are dreaming too. What I remember from rodent experiments those are, like their modeling of current self, what-if projections back and forth. We experience them more when we are awake and different patches of the brain communicates better over long distances.* But is the activity really correlated with those wake states?

    *Maybe I am confusing sleep with narcosis here.

    1. I was going to say something similar. Once the face is decomposed into its salient components, there’s no need to reassemble them into an image that can be viewed by some fictitious “mind’s eye”. The original decomposition is the experience of seeing the face.

      This is why Dan Dennett regards the concept of qualia as inherently problematic, since it seems to imply the existence of an internal homunculus capable of perceiving qualia.

      1. I have to admit that I just have never really understood qualia. I can understand the words of various explanations and descriptions but it has never really made sense to me. Many people speak of qualia as if they are a obviously real things. But the concept has always seemed to me to have the same odor of mysticism that magic and the supernatural have, and to be completely unnecessary.

        But that could be a lack of understanding on my part.

  6. Very interesting, but I expect it would be hard for general applications, even if this were reproduced in humans. I would expect that which neurons fire for this or that facial feature would be different for each person, so before this could be applied to, say, a criminal interrogation one would first have to calibrate the facial mapping to that individual.

  7. I scan-read what appeared to me to be the high points of the paper to try and form an impression of what they did. A deeper reading just confused me – so that said…

    My main [ignorant] ‘takeaways’ are questions that are probably growing out of my misconceptions…

    What I think I read:

    Each monkey is presented with a series of archived faces & then the neurone that is stimulated by a particular gross feature [a face patch] is identified. I assume therefore that there isn’t a universal built-in ‘face mapper’ – every monkey is tailored differently during development & world interaction

    I wonder how they do the matching in the lab? Do they bury loads of probes in the brain at random [in the correct region of course] & then gradually identify what neurone does what? This must take days/weeks per monkey!

    When the monkey has been ‘calibrated’ for its unique mapping it’s presented with a face. The output is translated into a series of positional dots on a 2D plane – each dot being a neurone firing [or average output of local neurones?]

    The right hand ‘predicted’ photo-quality image is a bit of theatre. Not justified in my opinion – it is created from a transformation of the input image** based on the output dots. Somewhat circular. I *think* that’s why the hairlines before/after are a close match – not because the 2D dots define the hairline that well.

    ** Or is the right ‘predicted’ image a transformation of a standard photo-quality template? THAT would be impressive.

    1. Well, you went pretty far down into it, but it looks like my thought above is supported: Which neurons do what, exactly, can be different between different individuals.

  8. This is a very cool result, but it is important to note that it is about the initial response of the monkey to a presented face. The reverse engineering is taking measurements of neuron activity while the monkey is looking at a face, and then using the measurements to create a predicted face. The good matches are impressive, with overall accuracy about 75%.

    There is no evidence that this is the endpoint of facial recognition. Presumably the individual, whether monkey or human, would then use this basic face data to decide if it had a memory of a similar face, and if so do a more detailed comparison to see if was a true match. It might also engage in some facial expression analysis, to decide if the face was threatening for instance.

    There is also no evidence that these neurons would be involved at all if an individual was recalling a remembered face, as in the supposed forensic application. They are more likely only involved in responding to faces being viewed.

    It does seem like it would be useful in developing better face recognition software for image processing, and possibly recognizing other things as well.

  9. Seems too good to be true. Time to withhold judgement until those-in-the-know give some feedback.

  10. Practical applications could include predicting which Alzheimer’s patients will forget what their spouses & children look like. Also, there’s that disorder where people can’t recognize or remember any face at all. They get by through memorizing the individual features of people they are expected to recognize, so they obviously can see those things. This research could help identify a way to literally rewire those brains – could they put wires into those brains to get the brain to turn the individual letters of facial features into whole words or names?

  11. The interesting aspect of this article is that encoding faces requires only 205 neurons. That is to say, the set of all faces forms a very low-dimensional subspace (one that only requires 205 degrees of freedom) of the space of all possible 2D images. This has been known for some time by people working in digital face recognition, which is why modern face recognition approaches (e.g. eigenfaces) employ very parsimonious representations of faces. The underlying intuition is that faces are highly constrained objects in the space of all possible 2D images, since they all contain two eyes, a nose, a mouth, etc. The remarkable thing about the article discussed by Jerry is that we now know that the brain has similarly optimised its internal representation of faces to use only a small amount of information.

  12. As an addendum to my previous post, I should also say that it makes evolutionary sense for the brain to employ a very parsimonious representation of faces, due to the huge caloric demands of neural infrastructure. An animal whose brain employed a very inefficient representation would require much more energy to perform basic tasks of image recognition.

    1. @Hardy H

      The inability of people without training to recognise [& interpret expressions on] upside down faces – does this difficulty occur at the level you describe?

      A study from 1988 concluded: “…that the evidence that inverted faces are processed differently from upright faces is far from compelling, and therefore the effect of inversion provides little or no evidence of a unique process in face recognition. The inversion effect is interpreted in terms of expertise in face processing and the highly homogeneous nature of faces as a stimulus class”

      I take this to mean that our brains aren’t developmentally wired preferentially for face recognition over other common objects – that the ‘wiring’ aspect of what we attend to as babies is very general. Perhaps we attend to symmetries, for example, rather than say specifically eyes, nose, mouth?

      That said there’s reactions to snakes that aren’t learned – so perhaps there’s an inbuilt library of instinctive warnings too. Messy nature – as per this JAC post from 2013…

  13. A million monkeys holding handmirrors up to each other, shouting, “Look at the monkey.” Eventually this could produce a philosophically astute monkey adept at introception.

  14. Some time ago, some people also managed to use cats as low-def cameras. The target was text, in big letters.

    Must go to work right now; will look later.

  15. I remember reading, a long time ago, in a big glossy coffee-table book about the human brain, that there would be ‘grandmother neurons (or groups of neurons)’–that is, each individual face would have its own tiny region in the brain. I was dubious at the time, & I am glad to see that the idea has been debunked.

  16. I had not realised that fMRI had advanced to this point… but that is the main message for me.

    Once you can read out individual neurons in some early visual area processing area, then it is unsurprising that you can reconstruct faces. It has been known for ages (from sticking a few electrodes in V1 of monkeys etc) that the first few processing steps are crude feature detectors, for instance responding to vertical edges in some part of the visual field. If you can read out enough of these neurons, and gather enough data by showing them many many (made up) faces, then the reconstruction becomes a standard artificial neural networks procedure. (Reading enough points, for long enough, was hard with electrodes.)

    In essence you have replaced the first layer of some CNN with with monkey-brain feature detectors. Training the rest of the CNN works just as if you had used randomly generated silicon feature detectors.

    I’d be very surprised if the data from one monkey was much use at all for another monkey (except in the crudest sense of knowing which bit of grey matter to zoom in on). Ditto for humans — before they can put the police identikit artist out of a job, they will need (at very least) to show the poor witness the 40000-odd faces they trained on many times.

    Anyway that is my take. Not my area at all, so if someone who knows more can tell me where I’m wrong, that would be great!

Leave a Reply