Limits Of Hierarchies In Biology

(Moved from LimitsOfHierarchies)

Taxonomy in biology (mostly) works because it is based on a one-way relationship in time: An existing species may generate a new species, but cannot evolve into another existing species. And also, a species may diverge into two or more species, but two or more species cannot converge into a single one. This is of course oversimplified, but it basically holds up in the big picture. Being one-way allows the taxonomy to say Z came from Y which came from X, leading naturally to a tree-like taxonomy. If we can identify a one-way relationship within software patterns, we may be able to develop a hierarchical taxonomy.

But even in biology, which is often considered a best-case example of tree structures, the tree is highly impure. In other words, it is a graph (network), not really a tree because DNA can "jump" from leaf to leaf in a rather arbitrary fashion (see below). Perhaps directed graph because of the forward movement in time. But a directed graph is not a tree. Because of cross-species DNA "pollution" caused by virus and other mechanisms, as described below, any individual's or any species' DNA can potentially mix with any other. Thus, a directed graph. Usually the effects are too small to notice based on outward appearence, but technically crossing can and does happen at the genetic level. A tree at best is an approximation. "Good enough"? UsefulLie? Well, that depends.

It is only "impure" if you are considering it a map of genetics, rather than speciation. Regardless of where that rogue DNA came from, it is not getting placed in the target's ancestors, but in the target. Birth is a non-transferable property.

But under extreme circumstances, a horse may give birth to a pig. A mother is just a natural incubator and does not by herself guarantee anything about the features of the child. It is almost like classifying cars by where you park them.

Perhaps you mean that inserted DNA does not get passed on to offspring. Well, it is possible for reproductive organs to be "attacked" also. Ouch!

Perhaps birth is the wrong word - conception, maybe. Your genetic parents are immutable (well, who they are is immutable, anyway). Heck, even that horse that birthed a pig is still irrevocably the "mother" of that pig. The biological family tree is that way.

Now, should you be referring to the genetic source for any one organism, then yes it can have a "parent" donating genes in addition to the biological parent(s). But that just changes a linear or binary tree into a N-ary tree.

Let's get this straight, we're going off on tangents: The taxonomy of biology (e.g. homo sapiens) is based primarily on speciation, which is a genetic relationship between species over long periods of time. For the most part, especially over long periods of time, speciation is a one-way, irreversible relationship. There are some counter examples, such as incomplete speciation, but in the big picture, the species of 'horse' will never give rise to an instance of the species of "pig". You can implant a pig into a horse, but that pig came from pigs, not horses.

A bacteria strain could potentially grab DNA from a horse and put it into a pig's egg or sperm such that a new pig can suddenly have a small percent of horse in it. Thus, technically you are incorrect, at least in part. Horses can give birth to another horse with some pigness to it. (Although I believe it to be rarer the higher up in the complexity scale of the creature you go. Any biology experts want to chime in?) If the bacteria grabs the snout DNA from a pig and puts it into some horse sperm, then potentially a horse could be born with a snout. (That particular alteration is probably a long-shot, but not impossible.)

{I'm not a biology expert, but from the courses I've done, this is actually one of the main things to worry about about genetically modified (GM) crops. I've had this conversation with anti-GM protesters and in every case their objections seemed based on a semi-religious "thou shalt not mess with Life" kind of thing. They're worried that GM contaminates things, like radioactivity would, and that people will start getting mutated by eating the stuff. Far, far more of a danger is that (say) engineering a plant with resistance to a weedkiller means that gene is out in the environment. Humans aren't likely to gain it, because we don't ingest things directly into our cells. The problem is the bacteria around who DO ingest the world directly into their interior. And ARE very good at grabbing loose bits of DNA and using them.. it's a survival strategy for them. It's a way of sharing DNA between them - they don't reproduce sexually, so without this, all the cells in their family tree would be genetically the same. Cosmic ray hits don't destabalise DNA in the right ways - it's much more likely to produce a non-viable coding than a different possibly useful viable one. So they're good at grabbing passing DNA to increase their variety. Once they start getting hold of weedkiller resistances, the point of the crop disappears. And so does the usefulness of that weedkiller. Even relatively benign adaptions could be dangerous - codes for vitamin A for example. I mean, who cares if bacteria start making vitamin A? Well. It depends if vitamin A suddenly turns out to be a super-nutrient for something horrible... Now, I don't believe we shouldn't be doing GM because it could be great. I don't, however, think we've fully thought through the consequences. I just don't like the way the protestors seem to be completely unaware of just how complicated the issue is and how subtle the dangers actually are. -- KatieLucas}

Speciation is mostly about reproduction, not mutation (talking about sexual species in this case). If it mates with horses, it's a horse. If it mates with pigs, it's a pig. It is so rare to break that rule in reality as to be inconsequential. Certainly nowhere close to 10% as suggested below.

But speciation itself does not give us trees. About all we can say is:

  isSameSpecies(specimen1, specimen2)

I don't think this is enough info to make a clear-cut tree.

Yes, trees too arose from speciation ;-P

Real life taxonomies are like this too in my experience. Maybe they are say 90 percent tree, but roughly 10 percent is not. (If "tree-ness" can be assigned a continuous scale. I don't know of any formal such metric algorithms - a Tree-O-Meter.) The decision is then to use a tree structure and somehow live with the occasional poor fits, or abandon trees altogether and use say sets or graphs instead.

A tree is a connected, acyclic graph with exactly n-1 edges where n is the number of nodes. Tree-ness could be measured by the number of edges which deviate from this ideal.

I suspect we could make a graph of every known species with a weighted link between them that indicates the similarities in DNA. For example, our DNA may be 96% the same as chimps. But I am not sure the result would be a tree, since common ancestors are often extinct, and we cannot test their DNA. It probably involves probabilities and weighted guesses from paleantologists. I have read about software that makes a guess based on physical features (coded as attributes typed in by paleantologists).
  isSameSpecies(specimen1, specimen2)

Speciation is this relationship: species1.isAncestorOf(species2). There are few (but some) examples where this is not a one-way, single-inheritance relationship over long periods of time (e.g. some plant hybrids). Overall, a hierarchy is a good representation on a large time-scale.

But because of the cross-contamination of DNA, many or perhaps most ancestor trees are not "pure". It is sort of a "fuzzy tree" in a similar sense of "fuzzy logic". Whether the "match" gets better or worse on a large time-scale is an open question. I suppose we should consider both the case where there is never cross-contamination, and the case where sharing is very promiscuous. IMO, human-made things or processes have the more cross-contamination flavor because we can. California will borrow laws from Massachusetts because it can (and change them some).

  isSameSpecies(specimen1, specimen2)

Speciation is this relationship:


Is this a Boolean or a "matching" rank? For practical purposes, species is often defined as the physical ability to mate with each other and have viable offspring. This does not necessarily reflect other non-mating ways of exchanging genetic material.

In terms of both the number of species and population, the bacteria and archea dominate the "species" tree, and horizontal gene transfer is quite common among those domains. Inheritable material can shared in several ways, including sexual reproduction, horizontal gene transfer, and viral injection. In species that rely on sexual reproduction, most attempts at cross-species mating are either completely unsuccessful or result in offspring that are themselves sterile.

Of course, this excludes wholesale aggregation of chromosomes as a mechanism of hybridization. Wheat is an example of this, being the aggregation of 3 different grasses' chromosomes. Its large number of chromosomes is also the reason for its large seeds.

One might make an observation that GM organisms result simply from the provision of an additional mechanism for allowing species to share genetic material.

Bats and birds both fly in similar ways, yet they did not get this trait through a common ancestor (according to most biologists). Another case where sets are more flexible than trees.

Yes, this represents a has-a relationship, rather than an is-a relationship. Both bats and birds have flight, but bats did not descend from birds or vice versa (though they have a common ancestor, neither bat nor bird). The DNA for flight in bats and birds is very different, having evolved separately. If the DNA was the same (or even somewhat similar), this would be counter-evidence for the use of a hierarchy to represent speciation. Other examples of parallel evolution include the similar body-types of a rhinoceros and a triceratops (or one of its cousins), mimicry in insects, etc. In each case, the similarities are in the phenotype, not the genotype.

DNA Can Jump Species

Just one word, Benjamin: Plasmids. See,

Plasmids can be exchanged across species (,


Kind of an aside to the hierarchy discussion, but plasmids are not a particularly strong argument for abandoning speciation of bacteria. Most bacteria possess very efficient systems to restrict plasmid migration such that plasmids from different species, or from viruses, are very rapidly chopped up and destroyed. This same system is also used to stabilize the bacteria's own genome during replication as well. Although bacteria in which these genome stabilization systems have been deactivated can be constructed and are viable, the observation that in general this has not spontaneously happened argues that the bacteria functionally value genomic stability and anti-cross-speciation over the benefit of the additional genetic diversity that foreign DNA could provide. --AndyPierce


I believe the idea of "promiscuous" bacteria "contaminating" (shareing) DNA across species, busting a "clean tree" was mentioned above. However, the editing seems to have obscured the original mention if I am not mistaken.

Yes - KatieLucas mentions it briefly in a paragraph that seems to be mostly about genetically modified organisms (GMO), which is a different issue. RefactorMe!

Plasmids are not a good example; the tree is still clearly defined, based on inheritance of the main genome. Plasmids are just the chocolate sprinkles on top of the great ice-cream sundae of bacterial genetic inheritance.

I am not sure what you mean. There is no pure tree that I can see. Besides, are the "nodes" the organisms or their DNA?

There is a mostly pure tree, based on descent and inheritance of most of the genome, for organisms and their DNA - it's hard to have one without the other. The majority of gene transfers involve auxilliary DNA like plasmids, and they occur very rarely outside the bacteria and archaea. And when they do occur, the transitions are very small compared to the total amount of DNA in the organism. All in all, you can expect that the far greater portion of an organism's genome will have been inherited through normal means. The idea that horizontal transfer destroys the tree picture is popular but a considerable exaggeration.

I fear that this picture obscures some important mechanisms while coming to a laudably accurate conservatism about popularizing genetics. It is true that "gene transfers occur very rarely outside the bacteria and archaea". It is also true that gene transfer is so common among the bacteria and archaea that some researchers facetiously suggest that we should view those as just two species (since a common working definition of "species" is "organisms that exchange inherited material"). What I take issue with is the assertion that "when they do occur, the transitions are very small compared to the total amount of DNA in the organism". I have problems with this on several fronts:

I suggest, therefore, that it might be more appropriate to suggest that a tree that is constructed from outwardly observable traits and behaviors -- the "traditional" biological hierarchy -- is likely to vary in significant ways from a corresponding tree that is constructed from differences in inheritable material. Biology is only now taking the first steps towards describing the second tree, and we are surely decades or even centuries away from understanding, in any deterministic way, the potentially complex relationship between the two trees.

{Such a tree would only be an approximation or interpretation. Biologists can and do argue over how to construct such trees. Genetics has given them a new tool, but it cannot settle exact tree-ness either because pure trees don't exist at the genetic level either.}

In higher eukaryotes, the overwhelming majority of inheritable DNA is non-coding; in bacteria, it is not. Furthermore, plasmids do contain non-coding sequences (indeed, there are certain non-coding sequence elements which are required for them to exist, like origins of replication). They are certainly more gene-dense than the bacterial genome, but not by a great deal. Furthermore, the size difference is substantial - plasmids are in general not more than ~20 kb, whilst the E. coli genome is ~4600 kb. Thus, plasmid-mediated gene transfers really are minor in comparison to the size of the genome.

DNA is to some extent analogous to executable code, but not to the extent that you imply. Firstly, horizontal gene transfer by plasmids does not involve modification to the host genome, but simply adds more genes to the organism; it's more like installing another DLL than altering a running program. Secondly, small changes to genomes tend to have small effects; the greatest effect on a coding sequence that a single-base change could achieve would be premature termination of a gene (a 'nonsense' mutation) or the production of a protein with an altered amino acid sequence that has either no function or a modified function (for example, a signal-relaying protein which would normally be under the control of some other protein might be permanently switched on). Mutations to functional non-coding sequences would tend to have even less effect, as the functions of these sequences are determined by their interactions with proteins (mostly), which have a degree of robustness in their sequence recognition (for example, the bacterial Pribnow box, involved in the control of transcription, has a nominal sequence of TATAAT, but will still work well if one or two of these bases are changed).

Yes, the executable code analogy is overly simple. I was thinking of transfers that induce read-frame shifts. Are you certain that these mutations have "minor" or "small" effects? I'm under the impression that they, instead, often have major effect -- and that those effects most commonly therefore simply kill the affected cell or organism. The result, in terms of the genomic tree, is quite similar (maybe identical), but the mechanism is different. I think what I'm suggesting is that the relative stability we observe from the outside reflects a dynamic, rather than static, equilibrium, where disruptive effects and changes tend to be balanced by compensatory mechanisms.

As for gene doubling - do you mean rearrangement of an organism's own genome, such as when genes are duplicated? If so, surely this has no connection at all with the existence of a family tree; the changes to the genome are entirely internal to one organism.

Yes, including indications that the entire genome has been doubled multiple times. My understanding is that one implication of this is that once a doubling has occurred, then subsequent variation/mutation of the doubled content is less likely to be harmful to the organism and therefore more likely to result in branching of the genomic tree. For example, the observation that animals seem to get two and then four appendages (and that even in the insect world bilateral symmetry is generally conserved) suggests a doubling -- there is no fossil record of three-legged intermediate forms. But then I'm greatly over my head in developmental biology here.

[Or it could simply be due to the fact that you can't walk using three legs. Robots aren't constructed with three legs for that very reason.]

The observation about the conflict between two trees is accurate - there is a battle raging in taxonomy between those who wish to classify organisms based on anatomical features and those who wish to use a purely genomic approach, with the relationship between the trees they support being unclear. However, this is a conflict within science, not nature - even if we cannot determine what it is, there is a pure tree of bacterial family relations.

I'm not sure that the directed graph of Eubacteria and Archaebacteria forms a "pure" tree, given the prevalence of horizontal transfer. While it is true that the progeny have identical genetic material to their parents at the time they divide, the point is that both the parent and the progeny have wildly different material a short time later. I haven't thought through the implications of that on what would otherwise be a pure tree.

I suppose that the "budding" (division) pattern, if it could be observed and logged is possibly a "pure" tree, but it is sort of a moot point if there is no way we can see it since history does not record individual buddings (in the absense of horizontal transfer, the inherited material is itself the log, and so we can see it). We can only look at their existing forms and DNA, which would not reveal a pure tree (Each progeny comes from exactly one immediate ancestor. So far it's a pure tree, by construction). Budding and DNA exchange are somewhat independent (Horizontal exchange means that you can't identify the parent based on inspection of the child.) On a larger scale, strains and species, the intermix makes for impure trees.

I think the trees are still pure, but impossible to derive based solely on inspection of the inherited material of the participants. As you observe, I think you'd need to record the sequence at the moment of cell differentiation. You'd then have to be able to track the changes in makeup of each participant. Given horizontal transfer, the implication is if a genomic tree is constructed (where each "species" is defined to be "participants with sequence A"), then each participant hops from species to species as it exchanges material with its neighbors. So a bacterial "Horse" might well produce a bacterial "pig".

As soon as organism A trades ANY genetic material with organism B through whatever means, and they are not of the same species (if applicable), then a pure tree is busted, whether observed or not. Until the concept of a "fuzzy tree" is invented, I see no way around it. A and B are "mates" in a real loose sense, and mating busts trees on an individual level because pure trees only have one and only one parent. If species X is polluted with DNA from species Y, similarly a species-level tree is then busted. Maybe there can be a two-parent "tree" with 99.99 percent one parent and 0.01 percent another parent. However, I don't know what to call such a thing, but it aint a pure tree.
RE: If we can identify a one-way relationship within software patterns, we may be able to develop a hierarchical taxonomy.

The "cross-DNA-pollution" issues in biology are analogously common in the human and business world. I have seen company organization charts "grow" all kinds of cross relationships for political or business reasons. However, RealWorldHierarchies is probably the best place to further such a discussion.

Non-Tree-ness as a Potential Competitive Tool

Some speculate that the Cambrian Explosion may have resulted from an unknown genetic mechanism that could somehow use a "grab-bag" of features. Thus, we have things like nectocaris that look like half arthropod and half chordate (fish-butt but shrimp-head). The explosion may have been triggered when a mutation and/or microbe infestation allowed a gene cross-trading mechanism of some sort. Those critters that could take advantage of this situation were able to produce more variations and thus got a leg up on "pure tree" creatures.

This could be why the Ediacarin critters disappeared. Although they got to multicellurism first, they were perhaps stuck with the tree model, which limits cross-species trading. After all, mix-and-match conceptually seems more efficient than each branching having to reinvent useful idioms like eyes, fins, legs, chemical paths, etc. from scratch. For example, branch A may have the best eyes and branch B have the best immune system. If genes from A and B can be intermixed among two species, then each gets the best of both (eyes and immune system). This ability may have been lost over time as creatures grew too complex to easily reintegrate diverse parts, and as most of the better combinations were already taken up via past combinatorial experiments. Gould talks about this very briefly his "Wonderful Life" book.

This is perhaps comparable to forked software builds where it is easy to barrow across forks early in the project, but it grows more difficult over time because the internal structures and idioms grow different such that selected portions may need things specific to a given fork in order to function. If you want to mix-and-match across forks, it is best to do so early in the project, just after the fork (the "Cambrian" era of the software forking).

This could pose a problem for evolution in court because one can no longer claim "evolution predicts a tree of life". Creationists could argue that whether a tree is found or a graph (crossed genes), both can be "explained" by natural processes and thus the predictive nature is diffused. (On the large scale, we still find mostly tree-ness, but I'm not sure if "mostly" is good enough in court.)

For an example of applying ObjectOriented methods to managing Mitochondria and Chloroplast data where simple hierarchies are not adequate data see
Virus-induced genes:
Real trees! The big green leefy things. Now there is a true hierarchy. I have never seen leaves or branches grow into their neighbors (although I imagine that some species can do this).

 I have seen it. The tree growth program is sufficient to avoid running into itself, but it can run into its neighbors.

See also TaxonomyOfPatterns, CollectionHierarchies, LimitsOfHierarchies, WomenFireAndDangerousThings.

CategoryBiology CategoryHierarchy

View edit of March 31, 2014 or FindPage with title or text search