Stunning new research: cats recognize their owners’ voices—but don’t much care

November 30, 2013 • 11:00 am

This will hardly come as a surprise to cat owners, but I suppose it needed scientific documentation.  A new paper in Animal Cognition by Atsuko Saito and Kazutaka Shinozuka (from the University of Tokyo and the South Florida College of Medicine, respectively; reference at bottom and link here) shows that cats appear to recognize the voices of their owners compared to the voices of other people. But the kicker is that they don’t show much response, and certainly don’t move their legs when they recognize the voice. This is in contrast to previous studies of d*gs, which show that they not only recognize their owners’ voices but respond much more readily and with more striking behavior.

It’s a short paper, and I’ll send the pdf on request, though I think you can get it free at the link.

In short, the authors exposed 20 domestic cats (19 indoor, one “kept on a university campus by a male owner”) to a sequence of five recorded voices played over  a speaker.  Each voice simply called the cat’s name, and the responses of the cats to each voice were measured “blind,” that is, another observer was given a single video clip of a cat’s response, and scored it without knowing which voice it heard. “Response” was measured six different ways: ear moving, head moving, pupil dilation, vocalization (any sound made by the cat), tail moving, and “displacement” (“more than one step of displacement of both hind paws to any direction”).

The recordings were controlled for volume and speakers were asked to say the name in “the same manner as the owners” (that, of course, is not easily controlled).  To “habituate” the cats to their names, the five recordings were played in sequence: three strangers’ voices, then the owner’s voice, and then another stranger’s voice.

The results are shown in the graphs below.  The first plot shows that there was habituation: the responses declined between the first and third voice, all from strangers.  Second, when the owner’s voice was played as the fourth sound, the cats perked up, showing higher ear and head movement—but no movement of tails or limbs, and no vocalization.  The last voice, of a stranger, showed responses (measured as the percentage of individuals showing one of the indicators of “attention”) similar to that toward owner, so there was no habituation here. (That has been shown in studies of dogs as well).

The results are statistically significant, but not overly impressive, with p values of 0.03-0.05 (the probability that the increased response would happen purely by chance; 0.05 or lower is regarded as statistically significant):

Picture 1

The figure below shows the magnitude of the cats’ responses (summed over all behaviors) as opposed to simply the percentage of cats responding. There is a significant difference between the size of responses to the third voice—after the cat had been “habituated”—and the owner’s voice, which elicited a larger response.  The last voice, as above, didn’t evoke a response larger than that toward the owner’s voice. The starred comparison is the only one that was significant: between stranger 3 and the owner:Picture 3

Besides the weak statistical power (and high p values), there’s another problem. It’s possible that the cats aren’t responding to to the sound of the owner’s voice per se, but to the way the owner pronounces the cat’s name. It’s impossible to know from this study which aspects of the owner’s pronounciation or intonation are the ones picked up by the cats, though that could presumably be tested with electronic manipulations of the recording.  Nevertheless, cats do recognize their name when spoken by the owner versus by strangers.  I suspect, though, that the sound of a can opener interpolated in the sequence would elicit all six behaviors, including rapid movement toward the sound!

Finally, the authors give an interesting paragraph that readers might want to ponder:

The communication style of cats is very different from that of dogs, as mentioned above. In fact, Serpell (1996) has shown that dogs are perceived by owners as being more affectionate than cats. However, dog owners and cat owners did not differ significantly in their reported attachment levels to their pets (Serpell 1996). This fact may reflect the difference in expectations between cat owners and dog owners. One research questionnaire revealed that the more affection the dog owners have toward dogs, the more frequently they tended to have physical contact with them. However, no such relationship was observed among cat owners (Ota et al. 2005). Thus, the behavioral aspects of cats that cause their owners to become attached to them are still undetermined.

I’m not sure how a lack of correlation between one’s attachment to a cat and the degree of physical contact with that animal (which is of course determined largely by the cat!) has anything to do with the question of why one bonds with a cat. We ailurophiles can of course give our own answers: the purr, the softness of the fur, the grace of movement, and of course the very independence of the animal.

Finally, a bit of a biology lesson. At the beginning the authors summarize which animals can recognize their own young or individual “herdmates” through vocal cues (they don’t mention penguins):

A social ability widely seen in a number of species is differentiation between conspecifics by using individual differences in vocalizations. For example, zebra finches (Taeniopygia guttata castanotis) recognize mates on the basis of their calls (Vignal et al. 2004, 2008); bottlenose dolphins (Tursiops truncatus) use whistles for mother–infant recognition (Sayigh et al. 1999); mother vervet monkeys (Cercopithecus aethiops) can distinguish their own offspring’s screams from those of others (Cheney and Seyfarth 1980); and female African elephants (Loxodonta africana) can distinguish the calls of female family and bond group members from those of female outsiders (McComb et al. 2000). Similarly, some domestic animals are also known to be able to recognize individual humans through voice. For example, horses can match the forms and voices of familiar handlers when the handlers were presented together with a stranger (Proops and McComb 2012). Dogs can match owners’ voices and faces from others (Adachi et al. 2007).

I’m curious whether a mother cat can recognize the mews of her kittens versus those of unrelated kittens.  Perhaps work has been done on this, but I think if it had, the authors would have mentioned it.

I wish the authors could use LOLcats as abstracts of papers, because this one is appropriate:

cats_do_not_come_when_called

h/t: Michael, Hans, Adrian

_______

Saito, A. and Shinozuka, K. 2013. Vocal recognition of owners by domestic cats (Felis catus). Animal Cognition 16 (4): 685-690.

62 thoughts on “Stunning new research: cats recognize their owners’ voices—but don’t much care

  1. Whatever the mechanism, there is absolutely no doubt but that Baihu recognized me as different from all other humans. With me, he’s a normal cat who goes in for extended bellyrubs, competes with the computer keyboard for the use of my arm, rides on my shoulders, and prefers my cheek as his favorite pillow.

    He still thinks that all other humans are going to eat him. On the trail, if he’s walking, when another human approaches, he heads post-haste to the nearest bush. At that point, I scoop him up onto my shoulders, where he stays most firmly planted until the “danger” has passed. If he’s on my shoulders but has been positioning himself to get off an walk on his own for a bit, and another human comes into view, he immediately settles back down with a firm grip.

    b&

  2. I wonder how much of this is conditioning as well, which may tie in to the expectation of cat owners.

    Dogs of course are bred to behave certain ways. Those poor dogs that didn’t were often culled in the old days where today they just aren’t bred (I’m being optimistic here). Even with highly trainable dogs, you need to condition them to come when called. My dog has a very good response to name because I trained her first by calling her name & pulsing her lead and running in the opposite direction then rewarding with food (this makes it fun for the dog as well). I worked up on that to the point where no pulsing was required & then when I could call her off lead from greater & greater distances. If she didn’t come when called, she received no treat and perhaps a correction (benign but annoying – a pulse of the lead or if off lead a put back on the lead). She now comes on the first call every time because she has been conditioned to feel that coming when called is fun & rewarding.

    However, from experience, if you don’t condition your dog this way, they’ll hear you alright but they won’t bother coming no matter how persistent you are.

    I suspect the cats have not been conditioned at all with things that motivate them to respond to a call (also not bred that way but as I said before, you still need to train your dog).

    Some animals are indeed hopeless (& probably wouldn’t recognize their owner’s voice). These include guinea pigs – people often put the pig somewhere where it gets stuck but it needs to be coaxed out with food as it won’t come to its name.

    1. I trained my big neutered Tom to come to me when I called. I did so by calling his name whenever I saw him outside and making sure that he got plenty of petting and affection if he came to me. I also never picked him up and carried him back inside when he came to me–that would have defeated the training by teaching him unwanted things happen if you come when called.

      Eventually, I was able to get him back in at night by calling his name, walking slowly back to the house, and petting him all the way. Occasionally, he’d make a break for it when we reached the door, but mostly he’d walk right in. Sometimes he’d get the most funny look on his face when I closed the door and shut him in, like “Curses, tricked again!”

      1. It would be interesting to me if someone could prove that we can indeed herd cats. 🙂 I think we can. It’s a matter of finding the right motivation and your anecdote shows that it’s possible.

        1. Herding cats is easy. All you have to do is pick a random point in time and declare that the cats are now in the positions you intended for them to be in.

          b&

      2. I always train my cats to come when called. I do this by calling for them just before I put their food down. “Here puss puss puss, here puss puss puss.” It doesn’t take long for them to get the hint. They come when I call though it can take a while if they’re busy with something or quite a long way away.

        1. I find that the “puss, puss” alternating with little squeaking sounds generally works, even with cats that I have literally never met before in my life.
          I was walking down a street one day with a colleague – I’d been taking him house-hunting in an area of town I’d never visited before, and we were talking about something appropriate, so I made him a bet that I could charm the next cat we saw to come from [wherever] and talk to me. The next cat, a complete stranger, was snoozing in the sun on top of a very comfortable stone gatepost about 100m away on the other side of the road. It took about three minutes, but cat charmed off the fence post, along the street, across the road (full Green-Cross Code ; Darth Vader would be proud!) under sever al parked cars and over to come and talk to me. No tuna for puss, unfortunately, but I got a pint out of it.

  3. I note that Figure 1 would be marked with a big X if it was handed in as part of a first year practical report. Why is it presented as a line diagram when the variable (origin of voice) is not a continuous variable? What does the mid-point between any two data points mean? Figure 2 is right – it should be a histogram. I’d have presented the voice origins with the owner at one side or the other – this is all rather odd.

    On a less pedantic note, the reason why d*gs respond to all sorts of social cues from their owners, and generally do what they are told, whereas cats don’t appear to care, is that d*gs are pack animals, where they pay great attention to what the top d*g (the owner, for a pet) is doing. Cats, as we know, walk alone.

    1. Cats are solitary hunters, but my understanding is that fetal cats do form social groups with dominance hierarchies regulating such things as who is permitted to sun themselves near whom. To us it may just look like a group of cats scattered at random across a meadow, but the cats themselves are acutely aware of the distances and relationships between them, and arrange themselves accordingly.

      Pets cats often relate to their humans in similar ways, constantly aware of where their humans are and following them from room to room to maintain appropriate proximity even when they’re not looking for treats, bellyrubs, or laps.

      1. That’s what I was going to point out. Left to their own devices, F. silvestris catus will generally form colonies of dozens of members with a very specific and complex social structure and division of labor. They’re one of the more social mammal species you’ll find. I think it’s fair to put them higher on the social scale than wolves: they form larger collectives with more interdependent social networks.

        This is also trivially supported by their common appearance on the human social landscape. It’s not uncommon for “that crazy cat lady” to have a couple dozen cats living with her with only a modicum of chaos. But can you imagine what it would be like to have a couple dozen d*gs living with a single “crazy dog lady” in a similar space? Even small, 10-15 pound dogs? Perhaps not entirely outside the realm of possibility, but it’s practically unimaginable — whereas a clowder of cats like that is barely more remarkable than somebody who fanatically follows a foreign sports team.

        Cheers,

        b&

      2. It’s also important to point out that domesticated dogs are not like their wolf cousins. Ideas like, you need to establish yourself as your dog’s alpha so don’t let the dog in the door before you, are beginning to be publically challenged. Indeed, most dogs prefer the company of their human (they’ve been bred that way afterall) over other dogs (just try to get a second dog to keep your first dog company – they just want the attention of the human and get jealous of each other).

  4. Observation of my cats would indicate that they recognize clothing as well. My cats see me in uniform often and they will respond positively to my male coworkers when dressed similarly. However, as a coworker has indicated to me, the cats display some alarm or surprise when my they hear his less familiar voice.

  5. The way the owner calls the name might have a lot to do with the slightly positive response to the owner. As an attempt to see if the cat responds to just the voice, I suggest that the tests use a more neutral call. Have the humans say something like ‘Marco Polo’, and see if there is a differential response to the voices.

    1. And, another variation: have the owners use a random name, but spoken the same way they would were they to call their own cat.

      Also, mix-and-match the owners with the cats — that is, play back Fluffy’s owner’s recording to Smokey and vice-versa.

      Lots more potential permutations….

      b&

    2. Very true. My cats ignore my normal conversation (too trivial), so I have to speak in a special tone of voice or make special sounds to signal that I’m talking to them and may be saying something that’s worth their listening.

      It isn’t quite babytalk, but it sounds strange. I suspect that’s why so many people speak to their cats in babytalk, a special tone of voice just to get hte cat to listen.

      They’re quite good at learning which vocalizations in which tone of voice indicate which information, i.e., whether I want them to go upstairs or outdoors or whether they’re about to get a treat.

  6. Interesting study! Cognitive zoology is absolutely fascinating. A couple of comments though:

    1. About the experimental design, I think it’s rather unfortunate that the order of the owner’s voice and the last stranger’s voice was not randomized. I understand the desire to habituate the cats to their names by playing the first three strangers’ voices in sequence. But without randomizing the last two voices (owner vs. stranger), there’s no way to tell if the effect observed between the 3rd and the 4th voice is only in response to the owner being the speaker, of if it is more a reflection of the sequencing. How do we know that the cat isn’t just “annoyed” in a sense that it’s hearing its name for the fourth time? Perhaps what has been observed is more a measure of the patience level of a typical feline. Randomization could have addressed this criticism.

    2. I’ll have to look at the paper myself I guess, but in Figure 2, the effect difference between Stranger 3 and Owner is on the order of the effect difference between Stranger 2 (S2) and Stranger 3 (S3). Perhaps the p-value for the S3 to Owner comparison is 0.05, but then the p-value for the S2 to S3 comparison must be about the same, meaning there is about equal evidence here of a change in magnitude response between S2 and S3 as between S3 and Owner. This could be used to argue that the observed effect between S3 and Owner is really a reflection of the cat getting “annoyed” at that stage in the experiment, as they seem to have sufficiently “lost interest” just before, as evidenced by the effect difference between S2 and S3. I don’t know if I’m prepared to argue that this is a more likely explanation than the one presented by the authors, but I do think it’s a possibility. Again, if they had randomized the sequence of Owner and S4, then this criticism wouldn’t necessarily apply.

    3. Jerry, I’m a bit confused by your criticism of “the weak statistical power” of the tests. Am I correct in interpreting this to be a criticism of the small sample size? If so, I agree, but at least the research is an interesting pilot result.

    1. Jerry, Im a bit confused by your criticism of the weak statistical power of the tests. Am I correct in interpreting this to be a criticism of the small sample size?

      I think Jerry’s a bit more concerned in general about the arbitrary decision to use p = 0.05 as designating significance. Statistically, using such a low level for significance means we get an overwhelming number of studies that just barely meet that level of significance…and the odds are therefore skewed such that a great many of those studies really aren’t significant at all.

      Using p = 0.05 would get you laughed at at best at CERN; particle physicists use standards that are far, far, far more stringent. The problem faced by the sciences that use p = 0.05 as the cutoff is that we tend to have far more confidence in what we think we know than is warranted.

      In many ways, it offers a disturbing parallel with the religious apologetical “historians” who build sky castles on top of Mediaeval manuscripts of dubious provenance of copies-of-copies of fragments of second-and third-century documents in order to divine the “real truth” behind the origins of Christianity because, if we were to judge those documents as critically as they deserve, “We wouldn’t know anything at all!” The fact is, in that case, they really don’t know anything at all, and they’re just fooling themselves otherwise because they’d rather believe they do than admit ignorance.

      Sciences that use p = 0.05 aren’t nearly as problematic…but it’s a similar syndrome. We really don’t know a lot of the things we think we do, and we’re doing ourselves a disservice by pretending otherwise.

      And, yes. It’s difficult, expensive, and often impractical if not impossible to get higher confidence intervals in the sciences that use p = 0.05. And, no, that doesn’t mean we should give up and go home.

      What it does mean is that we need to be explicit in stating the error bars. For example, in this study, the proper way to report it would be that the research suggests reasons to suspect that cats may recognize their owner’s voices, but that, even if they do, they don’t exhibit a strong outward reaction in response to hearing it. And the conclusion should be that this is an interesting hypothesis that points to avenues of future research that may prove fruitful.

      Billing it as “Cats recognize their owners’s voices” doesn’t do anybody any favers, in my opinion, though.

      Cheers,

      b&

      1. Hi Ben,

        You said, “the proper way to report it would be that the research suggests reasons to suspect that cats may recognize their owner’s voices, but that, even if they do, they don’t exhibit a strong outward reaction in response to hearing it. And the conclusion should be that this is an interesting hypothesis that points to avenues of future research that may prove fruitful.”

        On this, I totally agree. Far too many research papers do not do enough justice to the many qualifiers and caveats to their conclusions. As you mentioned, this is not as much of a problem in fields like physics, but it is endemic in the biological and social sciences.

        But now I’m a bit confused by your proposed explanation of why Jerry criticized the statistical power of the tests employed. I agree of course that designating a Type I error rate of 0.05 is often too high to allow confidence in our results (especially when multiple hypotheses are being tested – as appears to be the case in this paper), but this doesn’t really relate directly to a criticism about statistical power. It’s true power and Type I error are related in that you can increase the former by increasing the Type I error rate, but it seems the opposite of that is being suggested (i.e. alpha=0.05 is too high of a cut-off, so should be lowered).

        Usually when I hear people criticize the statistical power of a test, I take it to mean they are objecting to the sample size, as increasing sample size is usually the easiest way to increase statistical power.

        1. Usually when I hear people criticize the statistical power of a test, I take it to mean they are objecting to the sample size, as increasing sample size is usually the easiest way to increase statistical power.

          Superficially, you’re right.

          But you’re also assuming — or, at least, implying — that the effect would remain present with a larger sample size.

          And…statistically, we can expect the exact same proportions of experiments to meet the p = 0.05 level regardless of their actual validity and regardless of sample size.

          The fundamental problem is in overstating confidence. There’s also the associated problems with the difficulty in properly applying the math (such as properly quantizing the null hypothesis and so on).

          By my way of thinking, basically everything you read that includes the words, “statistically significant” should instead almost invariably instead be read as “highlights a promising candidate for future research.” That’s especially the case since most of these reports are of relatively simplistic and small studies, and they so often have superficially obvious flaws (such as the ones that Matt Cobb pointed out in post #3 above). It’s important work that needs to be done, but it’s really only just a first-pass filter that should point the way towards future research.

          Cheers,

          b&

          1. I don’t believe I’m assuming or implying anything about the effect to be detected here. An effect is either present or it isn’t. In the setting of this particular problem, either there is a difference in the sum behaviour of the cat between Stranger 3’s voice and its Owner’s, or there isn’t. We get to decide how large of an effect we want to be able to detect using our statistical tests, but that does not change whether or not there is an actual effect between the two comparison groups.

            There are essentially four ways to increase statistical power: increase the Type I error rate, increase the desired effect size to be detected, increase sample size, or use a different statistical test (this last option is often not really an option, and even when it is there are usually other problems). Increasing a sample size does not have to have any effect on the effect size we want to detect. In fact, in the interest of good experimental design, I think you would usually want to decide on an effect size to be detected first (along with acceptable Type I and Type II error rates), and then derive from that the appropriate sample size.

            Increasing sample size reduces sampling variability. Consequently, it decreases the chance that we erroneously reject the null hypothesis (it decreases the p-value), and it also decreases the chance that we fail to reject the null when we really should; i.e. it decreases the chance of Type II error, and so increases power. This is true regardless of effect size. If you were to simultaneously increase your sample size while decreasing the effect size you would like to be able to detect, then I agree that this will not necessarily translate to better power. But I can’t really think of a reasonable situation when you would want to do that. Regardless, I don’t think that’s being suggested here.

            I agree that it’s a problem to oversell our confidence in a result, but this is not usually a consequence of low statistical power.

          2. An effect is either present or it isn’t. In the setting of this particular problem, either there is a difference in the sum behaviour of the cat between Stranger 3′s voice and its Owner’s, or there isn’t. We get to decide how large of an effect we want to be able to detect using our statistical tests, but that does not change whether or not there is an actual effect between the two comparison groups.

            The problem is that you’re presenting two black-and-white options when there are actually a great many options with many shades of gray.

            There may be an effect, and the measurement accurately represents the presence and magnitude of the effect.

            There may be an effect, and the measurement accurately represents the presence but not the magnitude of the effect.

            There may be an effect, but the measurement inaccurately indicates the the absence of the effect.

            There be no effect, but the measurement inaccurately indicates the presence of the effect to one degree or another.

            And there are innumerable ways for the measurement to go worng, from honest failures of methodology (e.g., Clever Hans) to bad luck (such as flipping a truly honest coin “heads” seven out of ten times) to bad analysis (statistics is hard and few non-statisticians appreciate the subtle ways you can go worng) to outright fraud.

            Just throwing more rounds in your trial at the problem will only help you with the second of those options…and, well, even then, if you have an hundred iron-clad studies, each of which reporting 95% confidence in their results, you’d still expect roughly five of those to be flat-out worng, and you’d expect lesser degrees of error with many more.

            And when the effect in question is as subtle as whether or not a cat swivels his ears in response to an audio cue?

            Again, it’s an indication not that we know something, but that we have a good indication that there may well be something interesting to be found if we look deeper.

            Cheers,

            b&

          3. Hi Ben,

            I think I see where the confusion is coming from. The notion of “statistical power” has a very specific definition: it is 1 – beta, where beta is the probability that we fail to reject the null hypothesis given that the null hypothesis is false (this is called the Type II error). This is different than the notion of “statistical confidence,” equivalently called the “confidence level,” which again has a very specific definition: it is 1 – alpha, where alpha is the probability that we reject the null hypothesis given that the null hypothesis is true (this is called the Type I error). These definitions relate critically to the scenarios you outlined above, and are critical to my initial concern with criticizing the “statistical power” of the tests.

            1. If there is an effect and the measurement accurately detects its presence and measures its magnitude, then great. We have not committed either a Type I or Type II error.

            2. If there is an effect and the measurement accurately detects its presence but not its magnitude, then the situation is more complicated than what we are discussing here. The authors of this research are not using statistical tools designed to measure the magnitude of an effect size, they are simply testing a null hypothesis of no effect against an alternative of some (any) effect. It’s true that the data itself produces an estimate of effect size, but this is only relevant on a relative scale to the other effect sizes given by the data for the purposes of the statistical comparison being made in Figure 2 of the paper. Regardless, there is no guarantee that a larger sample size will help with this issue if we were trying to estimate an actual effect size. Larger sample sizes can reduce the uncertainty in our estimates (due to sampling variability), but they are powerless to help us correct bias that is induced by either the measurement procedure, the sampling technique, the experimental design, or the intrinsic statistical tools used to estimate the effect.

            3. If there is an effect and the measurement fails to detect it, then this is exactly the case of committing a Type II error. Equivalently, this is due to a lack of statistical power. As I explained above, there are basically four ways to deal with this issue. And increasing sample size is most certainly one of these.

            4. If there is no effect, but the measurement indicates that there is one (of some magnitude), then this is exactly the case of committing a Type I error. If we lower our acceptable threshold for Type I error, then we lower the chance of this happening, and this is exactly what people are talking about when they say the p-values should be lower. Lower p-values protect against this type of mistake, and this type of mistake *only*. This is a critical point, and one that is often overlooked or confused with other things. It’s also a great illustration of why p-values alone are not enough in *any* study, and why statisticians work so hard to try to get researchers to consider more than just p-values. What’s more, increasing sample size always reduces a p-value (this follows directly from the math), which in turn always *increases* statistical power (again, this follows from the math). Your point that if we had a “hundred iron-clad studies, each reporting 95% confidence in their results, you’d still expect roughly five of those to be flat-out wrong,” is totally correct, but it only applies to this fourth scenario. You are talking about the probability of committing a Type I error. But when we talk about the statistical power of a test, we are – by definition – talking about the probability of committing a Type II error.

            As a necessary caveat, all of the above discussion is assuming a traditional frequentist framework (as opposed to a Bayesian one) for conducting tests of hypotheses, but this is the usual approach in the biological sciences. I’m happy to explain the math I cited above in greater detail if you would like, so please just let me know.

          4. Ah — I get where you’re coming from. I’m sure it’s not at all surprising to you that a non-statistician such as myself might not readily recognize a distinction between the phrases, “statistical power,” and, “statistical confidence,” without explicit definitions offered up for both.

            With that clarification of the terms, yes, I think we’re expressing the same things. And I think we’re on the same page as far as the needs for better communication about the (non-statistical) confidence one should have in study results and their (non-statistical) explanatory power, and how to improve upon both.

            Thanks,

            b&

          5. “[S]tatistically, we can expect the exact same proportions of experiments to meet the p = 0.05 level regardless of their actual validity and regardless of sample size.”

            I’m not sure exactly what you mean by “validity,” but let’s assume that a study is valid if and only if it has no systematic error (ie, no bias). Among all such studies whose null hypothesis is true, 5% will have p-values less than .05, regardless of sample size. In contrast, among all unbiased studies whose null hypothesis is false with true effect size x, the larger the sample size, the greater the probability that the p-value will be less than .05.

            Among studies biased against the null hypothesis, for a fixed sample size it goes without saying that the greater the bias, the greater the probability that the p-value will be less than the declared significance level. On the other hand, for a fixed bias against the null, the larger the sample size, the greater the probability that the p-value will be less than the declared significance level.

            That took more words to express than I had planned. Hopefully it makes sense.

          6. I think we’re expressing the same thing.

            Assume a thousand studies, all performed with the utmost professionalism, and with whatever degree of thoroughness you like. And, each of them has p-values of 0.05.

            About fifty of those studies will have incorrectly reported an effect that does not exist.

            Now, consider the wider pool: a similar proportion of studies that failed to meet the p-value standard of 0.05 (and thus were not published) did, indeed, investigate a significant correlation but incorrectly reported the null hypothesis instead.

            And since nobody publishes anything about studies that don’t reach p <= 0.05, that really skews things, for the pool we’re examining is not the total set of all studies performed, but only those that met the p <= 0.05 standard.

            Finally, throw in all the human foibles ranging from honest errors to incompetence to malice aforethought — with, sadly, some strong pressures in that last direction thanks to the way funding and career growth works — and the current crisis in confidence really isn't quite so surprising after all.

            I don't mean to suggest that we're totally fucked. I would, however, suggest that a fair amount of reform is called for, and one small portion of that would be the tightening of p-value standards, at least in terms of attributing "significance." More importantly, we need to reward publication of null findings as much as those of positive findings, have some sort of mechanism by which every study that gets reported gets independently replicated, and so on.

            Yes, yes — it would be damned expensive to do all that. And the sociopolitical and economic climates are such that this may well be a pipe dream. But we're only fooling ourselves by pretending that the current situation is "good enough."

            Cheers,

            b&

          7. “Assume a thousand studies, all performed with the utmost professionalism, and with whatever degree of thoroughness you like. And, each of them has p-values of 0.05.

            “About fifty of those studies will have incorrectly reported an effect that does not exist.”

            That’s not true. You’re confusing inverse probabilities. The p-value is the probability, if the null hypothesis is true, of getting a result at least as extreme as the observed result. That is, it is a conditional probability given that the null is true. But what you are stating is the inverse: the conditional probability that the null is true, given that the p-value is .05; ie, a false positive rate. That value depends (through Bayes’ Theorem) on the proportion of a field’s papers whose null hypothesis is true and the power of the field’s papers to reject the null.

            To illustrate how the false positive rate depends on the proportion of true null hypotheses a field studies, consider the two extremes. At one end, take parapsychology, which studies only true null hypotheses. Since the null is true in every single study, every statistically significant finding in the field is false. At the other extreme, consider a hypothetical lab that does nothing but confirm well-established scientific facts. Such a lab would have a false positive rate of zero; every statistically significant finding would be true.

          8. The p-value is the probability, if the null hypothesis is true, of getting a result at least as extreme as the observed result. That is, it is a conditional probability given that the null is true. But what you are stating is the inverse: the conditional probability that the null is true, given that the p-value is .05; ie, a false positive rate.

            If there’s a 5% chance of the null hypothesis being true but that you still come up with extreme observations, then those extreme observations are, indeed, false positives, and, in those 5% of cases with a p-value of 0.05 where the null value is true but you still saw the observed correlation, you’ve “discovered” something that’s not really there.

            And you can’t a priori decide that a given field / study / experiment is going to only ever investigate the null hypothesis or will only ever investigate real phenomenon. That’s the exact dividing line between faith / religion / pseudoscience and skepticism / empiricism / science. It matters no whether you’re observing the rate of gravitational acceleration in a vacuum chamber or the rate at which monkeys fly out of your butt; the same standards and skepticism must apply to both sets of observations and analyses.

            Cheers,

            b&

          9. This comment is meant to address the conversation between Jay and Ben.

            Jay is right here, but the misunderstanding is a common one, and there is at least one reasonable situation where Ben is also correct, but maybe for the wrong reasons. This misunderstanding is a particularly annoying and unfortunate reality of statistics under the traditional (frequentist) framework; i.e. the quantities we work with (statistical confidences and powers) do not necessarily align with our intuitive interpretations of these quantities (i.e. how likely a hypothesis is to be true or false).

            I don’t really know how to best explain this without resorting to a little bit of math. p-values are reflections of statistical significance, of a Type I error rate (alpha). The p-value is the smallest significance level (the smallest alpha) for which the observed data indicate that the null hypothesis should be rejected. This value is a function of the data, so is calculated after the study is done. Now alpha is, by definition, the probability that we reject the null hypothesis (H0) given that the null hypothesis is true; more succinctly, Pr(reject H0 | H0 is true), where Pr( A | B ) denotes the probability of the event A given the event B.

            This quantity is fundamentally different than Pr(H0 is true | reject H0), which is the quantity Ben is referring to with the example of a thousand studies reporting p-values of 0.05. In that example, we start with the event that we have 1000 studies that reject H0. Now we ask how many of those reports are correct; i.e. out of those 1000 studies where H0 was rejected, in how many was H0 actually true? This is Pr(H0 is true | reject H0).

            To calculate this probability, we can just use conditional probability (or equivalently, Bayes’ theorem). If you do this, you get the following expression:

            Pr(H0 is true | reject H0) = [alpha * Pr(H0 true)] / [alpha * Pr(H0 true) + (1-beta)*Pr(H0 false)],

            where alpha and beta are the Type I and Type II error rates respectively, as defined in my previous comment. Now a small p-value implies we can choose a small alpha value for our test of significance (arguably, choosing an alpha value ad hoc is not good methodology, but that’s a minor point for the issue at hand). So we see that the p-value does affect the quantity of interest on the lefthand side. But this quantity is a function of other things as well, namely the probability that the null hypothesis is true or false, *unconditionally*. This is what Jay was illustrating with his examples of testing hypotheses in parapsychology (where Pr(H0 true) = 1) versus testing hypotheses in a “gold-standard” lab (where Pr(H0 true) = 0). Indeed, in his first example, if Pr(H0 true) = 1, then our equation above shows that

            Pr(H0 is true | reject H0) = 1.

            So in this setting, of a 1000 studies that reject the null, all 1000 are reporting false conclusions. Similarly, in Jay’s second example, if Pr(H0 true) = 0, then we find that

            Pr(H0 is true | reject H0) = 0.

            In this setting, of a 1000 studies that reject the null, all 1000 are reporting true conclusions.

            Now with real science (i.e. not pseudoscience like parapsychology), we are never in these extreme cases. In general, 0 < Pr(H0 true) < 1, and the example Ben is talking about (1000 studies with p-values of 0.05, how many are incorrect?) depends on this quantity. In that example, our alpha and p-value are the same thing, 0.05. We need to agree on a power 1-beta, which is typically 0.8 or higher. Let's say 0.8 for now. We still need to decide a reasonable estimate for Pr(H0 true), as this is not something we can just measure. If we assume Pr(H0 true) = 1/2, so we are ambivalent about the truth of the hypothesis a priori, then we compute

            Pr(H0 is true | reject H0) = 1/17,

            which is "close" to the significance level of 1/20 = 0.05. So in this setting, what Ben is claiming is approximately true, but for the reason that Jay is giving.

            Of course, this analysis is highly dependent on our significance level (alpha or the p-value), the statistical power (1-beta), and on the quantity Pr(H0 true) – this quantity is called the "prior" in the statistical literature.

            I hope this helps clarify things a bit. But I'm happy to try to talk more if I'm not making sense.

          10. Ben wrote:

            “If there’s a 5% chance of the null hypothesis being true…”

            But where are you getting that 5% chance from? Neither a p-value of .05 nor a significance (alpha) level of .05 imply that there is a 5% chance that the null hypothesis is true. Reread my post Ed’s last one.

  7. Kink comes when I call him. He also greets me at the door when I come home. But we all know that Kink is extraordinary!

    1. Kink is, indeed, extraordinary — but he’s not the only cat who’s happy to berate his slave for abandoning the master when the slave returns…even if said slave had only left to procure more noms….

      b&

  8. Here’s a thought for an experiment I’d love to see performed, and one that I think would have a good chance at showing significantly more significance than this one.

    You’ll need a bare, windowless room with a cold, hard floor, a bunch of shirts (described below), and as many heating pads as shirts — with the pads set to a comfortable-but-not-too-warm temperature. Each shirt should be placed on each heating pad, and the shirt / pads should be spaced as widely as possible in the room.

    For the shirts, you’ll need one each of the following. Ideally, all the shirts should be the same brand, size, color, etc.

    * Freshly laundered with the owner’s usual detergent.
    * Freshly laundered with any other detergent.
    * One unwashed since the owner engaged in vigorous exercise.
    * One unwashed since somebody of the same gender as the owner engaged in rigorous exercise.
    * One unwashed since somebody of the opposite gender as the owner engaged in rigorous exercise.
    * One unwashed since having a strange cat used it for bedding.
    * One unwashed since having a strange d*g used it for bedding.

    The cat should be left alone and remotely observed for at least several hours. Time spent investigating the environment as well as time spent on or near each of the warm shirts should be measured.

    And, of course, there’s plenty of opportunity for expansion on the shirts — for cats from multi-cat households, include a shirt from each of the other cats, for example; include shirts from all the people in the household; include shirts from people of similar and different ethnicities and dietary habits; and so on. But then you’ll run into problems with too many shirts for the cats to investigate, and then you need to expand the study even further to do multiple sessions with each cat, more cats, more sophisticated math, and on and on.

    Anybody who wants to do a study like this, please do so — but only on condition that you report back to Jerry when you get it published!

    Cheers,

    b&

    1. Another variation: several people sitting around a circular table, each with a laptop computer. The cat is in the center of table, but is contained in a box. The people make no eye contact or have any form of communication with the cat. After a period in which the cat is acclimated as much as possible, the cat is released from the box. Now the question is: on whose keyboard will the cat lie down on?

      1. Ha — good one!

        …but a great deal would depend on the cat. Baihu, for example, would not leave the box. If the box were removed, he would do his best to not be seen — and, believe me, he’s damned good at that.

        In contrast, Smokey, a delightful kitten of a few months age whose acquaintance I made at Thanksgiving dinner, would either lie on the closest keyboard or the one of the person who was least able to resist his charms. And if you think people would be capable of sitting stoically in front of a computer while ignoring him, you know nothing of human or feline nature….

        b&

    2. The cat should be left alone and remotely observed for at least several hours.

      You’d get several hours of video of the cat sniffing at your camera.
      Old joke in the same vein : a psychologist wants to see what a chimp will do when left alone in a room with no stimulus. flat grey walls ; diffuse lighting ; no windows ; hidden door ; no cats, shirts or heat pads. So, the shrink locks the chimp into the room, waits 10 minutes to let the chimp stop bouncing off the walls and get on with whatever it is that chimps do when alone in a room. And the shrink goes up to the outside of the door and removes the key from the keyhole, bends down to see what he can see.
      What he can see is a chimp’s eyeball.

  9. Animals are not people. They don’t understand language. All animals can do is memorize sounds. Pets memorize the sounds of their masters voices or rather the sequences of sounds but don’t hear words.
    My dog recognized my dads car from afar as different from other cars merely because of the sounds it made due to his driving style.
    Cats are less sharp then dogs and probably recognize less so sounds of humans.
    Its just memory of sounds and memory of association with the sounds.
    they do not think like people as they were not made in Gods image as a intelligent being.
    Are people getting paid to research the obvious?

    1. Erm…I’m afraid you’re sorely mistraken. Multiple experiments have shown that many different species do, indeed, have at least as much language ability as human toddlers. There are dogs who will consistently fetch, for example, the blue ball out of a sample of toys that includes multiple items of multiple colors — and this includes novel instances where the dog has never before seen any of the particular toys in the set. The same cognitive language abilities have been demonstrated in multiple species of birds, especially including African Grey Parrots, and I think dolphins as well. Gorillas and chimpanzees most certainly, and presumably orangutans as well.

      I don’t know if similar research has been done with cats. In my experience, the average cat is more intelligent than the average dog, though there are certainly outliers in all directions on all sides. The main difference is that dogs are more interested in sycophantically pleasing their owners and thus more eager to show off. A cat is much more likely to get bored with the sort of routine necessary for canine-style training. That’s especially evident with the different play styles; dogs are easily satisfied repetititititively chasing after the same stick repeatedly thrown to the same spot, and want little more than to endlessly play tug-o-war with the same favorite toy. A cat will enthusiastically play with a new toy until it figures out the trick, at which point its interest quickly wanes. You can retire that toy for a while before reintroducing it, but the level of interest will never be the same as it was initially. A kitten who’s never seen a laser before will practically have an heart attack from the excitement. The next day it’s a fun game. But with a mature cat who grew up with lasers, you’re lucky to hold its interest for five minutes after it hasn’t seen the laser for a year.

      But there are certainly some damned smart dogs out there, I’ll definitely grant.

      b&

      1. Very good points, Ben, but I am not sure how one can know that the average cat is smarter than the average dog. As for the outliers, I am going to posit that the smartest dogs are smarter than the smartest cats, at least in terms of how we humans measure such things. My evidence would include your above descriptions about some exceptional dogs. I say this with a heavy heart, since I adore cats but now live with dogs.

        1. Ben Goren.
          Studies or not animals DO NOT have like language abilities with toddlers.
          How so???
          All animals are as dumb as each other. Nothing going on in there save minor desires and minor memory abilities to get those desires.

          Language to a animal is no more then sounds in sequences they would hear in a forest.
          They ascribe no meaning to these sounds but only remembrance of meaning.
          Babies very quickly give meaning to language because it touching upon their own thoughts. babies thoughts are already greatly complex relative to animals.
          Babies are just people with memory problems but in fact think as well as adults.As adults think like our parent called God.
          i’m surprised to hear the comment animals have equal or close language abilities as toddlers.
          Nope.

          1. I’m sorry, but, from your opening line:

            Studies or not animals DO NOT have like language abilities with toddlers.

            your entire post is not only an unsupported argument from (unspecified) authority, but an explicit rejection of empirical observation. You might as well be insisting that, no matter what repeated observations might otherwise suggest, larger objects really do fall faster than smaller ones, or that the Sun orbits the Earth, or that species don’t evolve.

            If you’d care to abandon your immature faith-based position and enlighten yourself, I suggest you start with the Wikipedia articles on Alex the African gray parrot:

            http://en.wikipedia.org/wiki/Alex_(parrot)

            and Coco the gorilla:

            http://en.wikipedia.org/wiki/Koko_(gorilla)

            Those’re only the two most famous examples of non-human animals using human language. You can find as many more examples as you might care to with no trouble whatsoever.

            Cheers,

            b&

          2. Beat me to it. 🙂 Also, check out the recent study that gives insight into the emotional lives of dogs and it has led to many considering whether dogs have personhood. I’m not really surprised at this and I’m sure other animals would have more complex inner lives than we give them credit for; mammal brains are mammal brains.

            Moreover, I’m currently reading The Self Illusion and it details the development of the human brain. Are brains continue to develop post birth and right up into adolescence. The prefrontal cortex where that higher reasoning occurs (controlling the impulses of the limbic system) is not fully developed by adolescence (which explains teen moodiness, bad decisions, etc.). Further, the reason we don’t remember infancy is because there is no constructed self until about the age of 2 or 3 when humans begin to become self conscious and can form “self memories”.

            I think, Robert Byers, it would do you well to look at the evidence of things instead of believing in something that is based on bias (like animals are separate than humans and that there is a “ghost in the machine” inside our fleshy bodies that is somehow separate and special).

  10. Fave bit on cats & dogs ever:

    Dog posits: “Human gives me food & water, pets me, plays with me, talks to me, makes me happy, takes care of me… therefore human must be god!”

    Cat posits: “Human gives me food & water, pets me, plays with me, talks to me, makes me happy, takes care of me… therefore I must be god!”

  11. Speaking of cat research, we nao know why catheists have an edge (thanks, Jerry!):

    “Best news we’ve heard all year: science sez cute kitty photos can help you learn stuff. […]

    Want to know why this works? Of course you do. Because SCIENCE! Japanese researchers published a paper in PLoSOne last year showing that study “participants performed tasks requiring focused attention more carefully after viewing cute images.”

    Experts agree! Looking at lolcats is good for your brain, if not ur grammerz.”

    [ http://thefinchandpea.com/2013/11/30/science-caturday-lolcats-make-u-smarter/ ; go there for links to the references]

  12. Two cat stories. When I was fairly young, one of my cats got stuck maybe 20 ft. up a tree. I held out my hands and told him to jump. He did, and I caught him. He did not have his claws extended when I caught him.

    Years later, while in graduate school, I visited a couple who treated their Siamese as a child. It hopped up on the dinner table, and I reflexively knocked it off. (My cats knew not to do that.) I left fairly quickly.

    Several years later, I visited the couple again, in another locality. Their front door opened into a very long room with the dining table at the far end. When I entered, the Siamese was on the dining table. It took one look at me and dove off the table.

    1. I never did quite understand keeping cats off of furniture. The stove, yes — Baihu won’t go near that, after the (not all that many) fear-driven times I shouted, “NO!” at him when he got close.

      But, aside from that, he’s welcome wherever he likes. It’s his home, too, after all….

      b&

  13. Do you know that horses can recognize and follow along with music? Opera music. I’m a big fan of the Metropolitan Opera Saturday matinee movie theater telecasts in HD (if you don’t know about these and love opera, go to “metopera.org” and click on “HD”).

    Last year they telecast a magnificent production of Aida. In Act 2, the Grand March scene features the triumphant return of the Egyptian army to the royal palace after a great victory. Among the soldiers are three horses, decorated in fancy plumes, etc. After the music, they all march off stage.

    Renee’ Fleming (to me, the goddess of opera singers, I would worship at her feet) served as between acts host, interviewing singers and others involved in the production. After her Act 2 intermission interview, she heads down a walkway, grabbing (what later turns out to be) goodies for the horses. Arriving at the area where the horses are kept backstage, she soon has them eating the goodies out of her hand (explaining she was raised around horses). She interviews the trainer, who explains the horses are chosen because they don’t spook easily at loud noises. She also notes the horses can follow along with the music and recognize when it is just about time for them to go on stage (they start to get a bit antsy). My interpretation is: the horses are trying to say, “Get ready humans, it’s almost time for us to go on stage.” Obviously, these horses are real opera pros.

  14. I once did an experiment with my late cat. I had recorded my voice calling him. I played the recording in a far corner in the room and stood in the other corner.
    *giggle* How to confuse a cat! He would start walking towards the recording, stopped, looking at me, looking at the recording, starting to walk towards me, stopped… Finally he sat and started to groom himself: displacement activity… Poor confused kitteh, I was laughing so hard. I gave him some extra cuddles and treats after that.

  15. Very suspect research. Perhaps these researchers are projecting their own lack of affection onto the cats.
    We have three cats living at our cottage. When they hear someone coming along the walkway to the door TC and Abbey will both come to the door; though they don’t know who is approaching they expect there will be stroking. OTOH Summer-the-little-stripey-cat (who looks quite like Hili) does not come to the door unless she hears my voice – and then she comes running.

    Just because cats do not exhibit the slavish behaviour of dogs does not mean that they don’t care. Anyone who knows cats can tell that they are extremely affectionate animals, just not with people they have no reason to be affectionate with.

  16. My cats are a mixed bag, so to speak. Of my 3 outdoor kitties, 2 come running when I call them and the other one might show up…..sometime….when he is good and ready. If other people appear on the scene, they all disappear till they are sure the danger is over.

    None of my indoor cats will necessarily come when I call them. They seem quite unconcerned about what I want them for. Of course, as soon as I sit down, a cat or two appears like magic for lap sitting and ear scratching which is immediately supplied. They definitely run the show around here!

  17. the more affection the dog owners have toward dogs, the more frequently they tended to have physical contact with them. However, no such relationship was observed among cat owners

    As interesting as that may be, I’m not convinced of its validity. From my own observations, and personal experience, I would be very careful in my definition of “physical contact”. By inclusion of the terms “affection” and “contact” in the same passage one might be led to conclude that the authors are speaking about contact where affection and attention are exchanged between owner and animal.
    Of course, contact could be bumping a leg, pawing, a pat on the back, etc. Also, is contact measured by the initial occurrence or length of time? What about a pet that sleeps with its owner? For example, if I were to measure the amount of time my cat spends in my lap, it may very welll exceed the time a dog owner spends petting her pooch. Without a clear operational definition, it’s wide open for interpretation. Self report questionnaires are problematic for this reason, an unclear understanding of what is being asked and reported.
    Maybe the researchers took some of these factors into account, but my kitty Mira cares little, as she will undoubtedly be in my lap as soon as I get home from work, and stay there for the remainder of the evening.
    I doubt that Great Danes are doing much of that with their human companions.

    An interesting thought when you consider the amount of effort put into breeding dogs to the size of cats.

  18. Cats are probably the most studied animal model of auditory neurophysiology, and have told us an enormous amount about neural response properties to complex stimuli like speech. Any reader curious how the mammalian nervous system analyzes the spectral and temporal components of speech sounds should check out the seminal work of Sachs M, Abbas P, and Young E. (Yes, the work involves anesthetized cats; honor their sacrifice by appreciating the science.)

  19. My cat does come when I call her, though there is a delay due to physical factors such as stretching, stopping to have a snack at the food bowl, etc. The time from calling is usually about five minutes but she speeds up considerably if there are cat treats involved.

Leave a Reply to Gregory Kusnick Cancel reply

Your email address will not be published. Required fields are marked *