12 days of evolution. #9: Can evolution create new information?

December 29, 2015 • 1:15 pm

One of the more sophisticated claims of creationists, especially used by advocates of intelligent design—I don’t think this term merits capitalization, for we don’t capitalize “creationism”, which is exactly what ID is—is that evolution “can’t create new information”, therefore, insofar as the process produces organisms doing novel things, God must have done it.

This ninth short video of the series produced by NPR and “It’s Okay to be Smart” attacks that claim. And once again I’m disappointed. I didn’t start putting up these videos to criticize them—they were simply a way to bring new pro-evolution material to public attention, but I haven’t seen one that isn’t either flawed or garbled. And I don’t much like this one, either, though it sort of makes the case:

The example they use for creating new information: gene duplication followed by divergence based on mutation followed by natural selection in the duplicated genes, is the common response, and a good one, but it would have been better if they’d given at least one example, such as the creation of new forms of globins that do different things, all descendants by gene duplication form an ancestral globin gene. As with other videos in this series, the narrator simply refutes the creationist question by assertion (“Evolution has no problem with adding new pages to the book of life”) rather than by giving good, comprehensible examples.)

I would have omitted the capture of retroviruses, as most of these have become inactivated once incorporated in our genome (and creationists would call that “the loss of information”), and concentrated not just on gene duplication, but on how new genes can arise, doing new things, from combination of bits of genes taken from the rest of the genome. My colleague Manyuan Long and his colleagues in my department, for instance, has traced the origin of many genes having brand new functions (ergo new “information”) from their origins elsewhere in the Drosophila genome. I wrote about the Long lab work in December of 2010.

For more examples, see this article in New Scientist, and this one in Scientific American. Both of these could have furnished tangible examples for the video.

38 thoughts on “12 days of evolution. #9: Can evolution create new information?

  1. Creationists are never able to tell us how they quantify the amount of information in a genome, so it is a meaningless claim. It seems to be a trick – if they claimed that evolution cannot produce new genes, instead of new information, then their claim would be seen for the trickery that it is, which works by muddying the waters.

    1. Great point Kevin.
      My Grandfather used to describe how sunlight enters the vegetables and fruit, then we eat the stored energy /sunlight. We then run on this energy. Our thoughts are literally sunlight.
      He also described a sunny day as an ocean of sunlight. Our eyes take in a minute fraction of the light, its seems almost impossible to grasp the all powerful effects of and our dependance on the sun.
      Anyway enough, I have diverged enough.

      1. You are not the only one thinking about information – energy. Physicists, particularly ex-pat string theorists, are tackling or maybe rethinking a lot of their ideas about the universe in regard to information.

        Here is an example of the growing interest of trying to explain our universe in terms of information:


        1. Amazing.
          My Grandfather later told me that many of his stories /theories were devised to allow one to grasp that we are part of the universe. Wholly, and completely part of the universe, have been since the big bang and will continue to be after our deaths.
          This leads to interesting conversations re peoples beliefs re the separate soul they believe they possess.
          Thanks for the link.

  2. The videos in this series are simply too short to be of any use. These concepts need to be both exemplified and explained if they’re unfamiliar. I suppose these little videos could serve as a reminder about an idea you’ve learned previously.

    1. I’d agree on the shortness issue, but that’s the challenge facing issues fought out on the turf of video clips (lots of people today will not stop to read long texts at all).

      From a #TIP methods perspective, a point to raise in any short video is that those who insist evolution can’t generate new “information” don’t clearly say what “new” information would need to look like to satisfy them. Arch-culprit here would be Bill Dembski, who to this day will still trot out analogies like Mount Rushmore, instead of diving in to show by genetic example what such things would mean. Its not as though there aren’t enough DNA sequences known to do some applied observing here.

      By stressing the inadequacy of the antievolutionist position in the course of the presentation (which the video didn’t), the examples presented would show that the science not only pays attention to the data, its the group that is producing the work, not carping from the design sidelines.

  3. Why, instead of explaining at a genetic level, isn’t the question answered at the organism level?

    Isn’t the production of the numerous species by evolution an obvious and clear illustration of information being created? Why confuse people with gene duplication and retroviruses?

    1. Good idea, one could do all examples.

      The reason to get under the hood of genetics is science and religion I think. We know how to express the information content better, and creationists want to discuss genes and sometimes refuse evolution of species. (Which is a rejection one shouldn’t give them of course.)

    2. Creationists misrepresent and attack evolution on all fronts. A common trope against evolution at the genetic level is information cannot increase in genomes b/c it violates the 2nd law of thermodynamics.
      Yeah, I know its stupid, but they spin it so to convince those who want to be convinced. I see it used a lot, and it fools a lot of them.

    3. Its much more difficult to define what counts as ‘information’ in a phenotype compared to a genetic string. Information Theory can be directly applied to the genetic code because its chemical structure is analogous to a string of bits. With phenotypes, you have the issue of deciding which characteristics count and how you weigh them (does jaw bone length count more than iris shape?).

      This is, IMO, one reason why IDers apply their concepts like irreducible complexity and CSI to organs and phenotypic structures and avoid using their tools to analyze genetic strings; because if they did the latter, people would very quickly realize they were talking bunk. They stick with the difficult and highly subjective cases as a form of cover.

    4. On the one hand organisms could be treated as black boxes and what they do with the environment could be examined for its information content.

      Moss growing on one side of a tree is information. A cougar burying its prey for future use is information. A cephalopod changing its color is information.

      On the other hand, the organism itself should, in principle, be sufficient information to explain everything about its environment…an ultimate form of predictive information. Take a cheetah from the savannah: a deterministic universe places in those sets of molecules information about how those molecules must have arisen. It is impressive to think how much information might be extracted from that one organism.

      1. Thanks for all the good replies.

        This way of addressing a problem, by dissecting and scrutinizing the essential bits, reminds me of the morality through science issue Sam Harris advocates. Critics are terribly concerned with issues like, what exactly constitutes science?

        Huh? You mean we have to clear that up to everyone’s satisfaction first, never mind that science is already applied to lots of things very successfully? And here, instead of just pointing out that a pine tree is different than an oak tree, we must show that it’s all to do with the way their genetic sequences arose.

        I don’t like playing that game.

  4. Well, speaking of globins, I once thought the structure and function of the beta globin locus was so sexy that I got confused and starting dating a hematologist. Beta globin evolution is still fascinating–the doc, not so much, though his research on the locus control region (LCR) remains mind-gripping, and I have that to thank for catalyzing my interest in gene regulation.

    This post prompted me to read: Distinctive Patterns of Evolution of the Delta-Globin Gene (HBD) in Primates .

    Their introductory description is fun to read:

    “These globin chains are encoded by members of the alpha- and beta-globin gene families, which arose via tandem duplication of an ancestral, single-copy globin gene approximately 450–500 Mya, in the common ancestor of jawed vertebrates [14,23,25,35,78]. The two paralogous gene families exhibit a number of significant differences in gene content among jawed vertebrate taxa. These differences are especially pronounced in the case of the beta-globin cluster, in which distinct repertoires of mammalian beta-like globin genes originated by independent lineage-specific duplications followed by functional divergence [33–35,52–54,57,78,79]. In both monotremes and marsupials, the beta-globin gene cluster contains a single pair of genes, the early expressed epsilon-globin and the late expressed beta-globin [53]. In contrast, within the eutherian stem, further tandem duplications gave rise to a cluster of five beta-like globin genes, containing early-expressed genes, located at the 5’ end of the cluster epsilon-(HBE)-gamma(HBG)-psi-beta(HBBP1), and late expressed genes, delta (HBD) and beta (HBB), at the 3’ end, consistent with the orientation in contemporary species [26,31,53]. The fine tuning of the level and timing of expression of each of these genes relies on interactions with the locus control region (LCR), located from approximately 6 to
18 kb upstream of HBE [4,11,84].”

    1. This is interesting. Thanks! I use hemoglobin proteins and genes for a lot of my examples in teaching molecular stuff.

    2. Why do I always see “goblins” when you write about “globins” (even my spellcheck agrees). I guess I’ve been reading too much Tolkien.

  5. I spent too many posts with a theist/creationists blogger over this issue but he argued from the view that DNA is an actual language and, well, every language is preceded by ‘mind’ right? It must be – non-living things can’t create language and information? He never explained exactly what ‘mind’ is but just the fact that I asked proved I was a blubbering idiot. He refer to a paper written by a linguist who showed that DNA has several of the qualities of a language. I agreed that it can be interesting to talk about DNA like this but in fact all language is symbolic and DNA is actually a thing that does stuff whether or not I’m there to watch it or talk about it. The author of the paper is a prof at NYU (I think) and the blogger never could get over the fact that I though the paper was biased and incorrect. You always have to be on the lookout for the bait and switch. You *think* they’re talking about evolution but it hits you that they are using a different definition. The fastest way I’ve learned to get a creationist to quit yammering is to ask them to define words or premises. It takes about ten seconds to reveal that they are just parroting something Ken Harm said.

    1. DNA is not an actual language. No one is fluent in DNA. The question is whether the cellular process of protein synthesis (which is more involved than just DNA) is analogous to a cipher or a code.

      A cipher is based on transformation rules for symbols (A -> Z) whereas a code is based on equivalences of meaning (“Chinese Fire-Drill” is an imperative that means the same as “exit the car when I stop, run around the car, and take your seat”). Does DNA have meaning to the organism, the way a stop light has meaning to motorists? This is related to the question of whether DNA actually “contains” or “has information”. A book has “meaning” because the symbols in the book has “meaning” to someone, and it conveys information in that it allows someone to do something. At the same time, the book qua physical book, if buried in the ground and exhumed at some later date when the language of the book has completely disappeared, has no information. In contrast, a car key buried in the ground would still have the same physical structure, one would be able to know something about the lock it was designed to fit, even if the lock were long destroyed. [Of course, one would have to recognize it as a key, but even if this were lost, the basic structure of the key is evident to the senses.]

      There is a question in all this about the nature of sentience and reason, but even if you say, crudely, “DNA is a language”, it doesn’t get you anywhere you aren’t already. Humans have language, and they have mind, and they exist in the physical universe. However, it is problematic for any biological explanation based on mechanical transformation rules. But we already knew that too. A cipher is no good unless you already have a workable language (or code). Denis Noble’s book, the Music of Life attempts to address multi-level causation and holism in biology.

    2. He refer to a paper written by a linguist who showed that DNA has several of the qualities of a language

      Cherry picking and the correlation-causation problem at its worst. We might just as easily say that language shares several of the qualities of a polymer (DNA is a polymer), so it obviously must have been formed by DuPont.

  6. It is a lot of production value in these videos – this one being a good example with the book metaphor – so I like them a lot.

    But of course the erroneous or omitted details are irritating too, so my New Year’s kvetching remains possible. Shuffling letters in a book looses language information (or in a genome recipe information), but it actually increases the information content of the book (genome). Usually you have to describe the kind of information you describe – Shannon information from a gene’s environment becomes its recipe information, Kolmogorov complexity of a genome is its information content which the S.I. _lowers_ in order to make useful, encoded information.

    The video seem to clear that hurdle, but just barely, by referring to that evolution can ‘create information’. Well, yes and no, it can learn information. If nature is the erudite master, is the student really ‘creating’ information? That also means it may be easier to understand that there is no ‘end in mind’ but just a jumble of selected [he!] possible paths.

  7. It is not necessary to have gene duplication to get “new” information. If we start with a random string of nucleotides, and then natural selection brings us to a particular string of bases (say) 1000 bases long, then by choosing one sequence out of 2-to-the-2000 possible sequences it has built information into the sequence.

    I am annoyed that so many people think that natural selection within one locus cannot build new information into the genome.

    1. There are of course many likely cases along those lines. One is where mutations in gene regulatory regions add new gene enhancers so a gene is expressed in a novel place/time.

    2. If we’re using a Shannon-style definition, its entirely possible that the selected string could have less information because it may be more compressible than the original. Selection can really go either way; increasing or decreasing information from an information theory perspective.

      However your point is still valid because ‘selection can increase or decrease information, even without duplication’ still completely undermines creationist arguments. Those arguments typically fall into either the “it must be conserved” camp or the “it only goes down” camp, and neither of those claims are true.

      What really annoys me about the example is something else; namely, that they could’ve stopped at ‘gene duplication.’ “This string string” has more information than “This string”. Their example implies that to produce additional information, some change has to be worked on the duplicate string to give it a new phenotypic function. I expect that Claude Shannon would disagree.

      1. Quibble: I think the “compressibility” criterion is Kolmogorov information rather than Shannon information.

        I agree about the rest. Particularly the supposed theorem that natural evolutionary forces can only make information decrease. If a mutation A –> C decreases information, once that happens one wonders why C –> A, which increases it, is mysteriously ruled out.

  8. Information is one of those wooley words.

    If you make a distinction between syntax and semantics, then the syntax of an expression is evident to all, whereas the semantic content is only evident to a person with a special kind of training.

    The code/cipher distinction is based on the syntax/semantics distinction. Codes make equivalences on the basis of semantics. Ciphers make equivalences on the basis of syntax.

    If you suppose DNA have “information”–which is imbuing them with a semantic content, I believe–then physical changes to the structure will not necessarily change the information content. “DOG” versus “dog”. However, if you consider the following: “Dog” versus “God”. You have a physical change that changes the meaning of the expression. But note, we have not created new information through our physical change. “God” already had a pre-determined meaning.

    If I trained my child to fetch the dog every time I said “Dog”, and to fetch my Bible every time of said “God”, then you have a simple system of meaning, with two primitives defined operationally. Say you observe me for a year, and each day I write the command “dog” and my son retrieves the dog. Then let’s say one day, on a fluke, I say “god” and the child retrieves the Bible. What you observe is a syntactic change to the expression, and a novel behavior (from your perspective).

    I don’t know that genes actually have information–but if they did, mutations could not change the semantic meaning of genes. There would have to be a system of meaning underlying how the cell knew how to interpret the new sign.

    Note all the probabilistic arguments about a system of meaning randomly arising can be addressed with the same possible worlds arguments used to dismiss fine tuning arguments, so I wouldn’t be so threatened.

    1. I don’t know that genes actually have information–but if they did, mutations could not change the semantic meaning of genes.

      Triplets of DNA base pairs code for a variety of amino acids, as well as the ‘stop’ command. These are called codons and you can see them listed here. That’s as close as they get to ‘semantics’ or meaning – a transcriptase reads a sequence of DNA, and produces amino acids depending on what it reads.

      Mutations can change which amino acid is produced or change an AA-producing codon into a stop codon or vice versa. For example: a mutation that changes the sequence CGG to TGG will turn “make Argenine” into “make a stop codon.” This can radically change how the resulting protein folds and what sort of biological activity it does, leading to a change in phenotype. Not always, but sometimes.

      So yes, to that extent, mutations change the semantic content. Just like a single point “mutation” in the letter string “I love kilting in my skirts” to “I love killing in my skirts” can radically change the semantic content of the sentence.

      1. I think even looking just at DNA is too simple. There is a complicated biological process that occurs between DNA and the construction of a useful non-lethal protein (prion diseases don’t change a protein’s chemical composition, they change the orientation of the protein–e.g. alter morphology not “physical substance” of the protein). So just looking at the function of DNA is inadequate to making this determination.

        There is an international research group in biosemiotics, but it is pretty small, and I don’t know that it has a lot of supporters in the biological sciences in general.

        Being an internet crank, it makes a great deal of sense to me that given human communication systems, it would make sense that there would be analogues to human communication systems at other levels of bios. But that is a hypothesis, and the data supporting that hypothesis, and the usefulness of such a hypothesis is not well established.

      2. Eric writes:

        “Just like a single point “mutation” in the letter string “I love kilting in my skirts” to “I love killing in my skirts” can radically change the semantic content of the sentence.”

        Yes, but “I love kilsing in my skirts” creates nonsense. The distinction between “kilting” and “killing” is based on pre-existing system of meaning between speakers. Like I am suggesting, I don’t know that anyone can definitively state that DNA has semantic content, beyond what can be gleaned from its physical morphology (a sequence of syntax).

  9. The information “muddle” is based on a conflation.

    In language, there is a syntax/semantics distinction. We say an expression is nonsense if it is syntactically well-formed but not meaningful–“Always turn right after you clean the abstract entity off the microwave.”

    On the other hand, in nature, you have something like matter and form. Matter is particular, form can be replicated. Obviously, DNA has form, as it is all about replication. In this sense, DNA has “information” in terms of its physical morphology. That does not mean that it has “information” in the sense of semantic content. But if it did, it would probably lead to a re-conceptualization of not only evolution but biology.

    There is a group devoted to biosemiotics, not sure if that’s too wu for this bunch, but it would be interesting if the basis of the distinction between life and non-life was that life-processes were driven by systems of meaning, language-analogues. It would make sense that human language would be a variation on a theme, rather than a one-off.

    1. I would be careful to reify language as some distinct and abstract thing. Language can also be thought of as patterns of physical morphology. Spoken word is pressure waves in air. They hit your ear, which moves hairs, which causes nerve signals. All very physical, just like the DNA transcription. Written words are detailed patterns of reflective materials. Light hits them, bounces into your eyes, causes photochemical reactions. Carved words are divots in wood or stone that likewise reflect light in a certain pattern. Your body (which includes senses and brain) responds to these signals. A transcriptase reacts to the ‘stimuli’ of the physicochemical conformation of a DNA strand…what’s the substantive difference?

      The line gets really blurry when we start thinking about whether animals communicate via smell. Because the biological process of smelling something bears a lot of similarities to the (first stage of the) process of transcription; in both cases, one molecule locks on to another based on surface shape and properties, causing a reaction. So if smell can be used to communicate, arguably DNA transcription is a form of communication too.

      1. No, I absolutely disagree. Language has semantic content, above and beyond any syntactic content.

        For example, no one knew what hieroglyphics meant for centuries before the Rosetta Stone was discovered. This is not to say hieroglyphics aren’t as physical as anything else in the natural world.

        You could say that the ID debate boils down to the question of whether the natural universe has semantic content. Obviously, language has semantic content–to human speakers–but it is not so obvious to many that Mount Zion has semantic content–beyond human speakers. The atheist/anti-ID person is going to look stupid denying that language has semantic meaning, above and beyond its syntactical form.

        Note, even if we assume DNA (or the system organisms use to produce proteins) has a semantic dimension, it would be a system of meaning relative to a cell, not to human beings. It doesn’t get you to intelligent design, it just gets you to a probability argument about the probabilities of a system of meaning randomly emerging. (Which would just be a different version of fine-tuning arguments.)

        My evil supposition is that I believe that there are teleological principles (optimization principles) operative in Nature, that we will eventually discover as we go from reductionism to looking at things as sub-components of a nested hierarchy of complex systems. But even here, even though I suspect it might be a Coyne heresy, I suspect you could very easily have micro-teleology without any macro-teleology.

        If you look at language, a language can only function for sub-components of a whole, and can only have meaning for a sub-component of the whole, therefore any language or language-analog can only express a kind of micro-teleology.

        So if the universe really expressed some kind of meaning, that meaning could only be cognizable to something outside the universe, e.g. God, and so I don’t think any empirical science could ever establish the meaning of life, the universe, and everything, or the existence of a God.

        1. I am inappropriately conflating teleology and semantic meaning.

          But an imperative like “Stop” or “Go” tells someone what to do. A stop light is a human artifact that has a purpose, precisely because a well-defined system of meaning exists. Likewise, evidence in a stream of a gold vein down-river may be natural, and precede the existence of humans, but the only reason anyone would care about evidence of a gold vein (the evidence would have significance or importance) is because they are part of a certain kind of human economic system. God, in some sense, would be completely indifferent.

Leave a Reply