Maureen A O’Leary. American Scientist. Volume 102, Issue 3. May/Jun 2014.
In view of the nearly countless, wildly varying forms of life that surround us, it is easy to forget that all the species living on Earth today represent only the tip of the iceberg of evolutionary history. By most scientific estimates, 99 percent of the species that have ever inhabited our planet are now extinct.
Charles Darwin and Alfred Russel Wallace gave us a theory of evolution that species descended from a common ancestor. In elegant simplicity it explained both the diversity and the similarity of all species on Earth. What these visionaries did not do-what was not possible in their time-was trace all the species’ lines of descent in the form of a phylogenetic tree, or Tree of Life.
Figuring out the Tree of Life for the hordes of living species (estimates run anywhere from 4 million to 100 million) and the tens of thousands of known fossil species has been the job of several succeeding generations of scientists. Within just our own class of animals, Mammalia, researchers have identified more than 5,000 living species and as many as 10,000 extinct ones. Paleontologically speaking, we are well off: Mammals boast a particularly good fossil record, including important early transitional species.
Now we are beginning to tackle this great challenge-for life as a whole, and for mammals in particular-with new tools for studying the anatomy of living and fossil species, with modem algorithmic methods, and with extensive genomic data. The goal is to build a phylogenetic tree, using all these living and extinct species, that will provide solid support for our reconstructions of the past.
Why Study the Tree of Life?
The availability of new software that allows scientists from many different labs to work together online as a research team has opened new opportunities for organizing phenomic data (that is, all of an organism’s nongenomic traits) on a large scale. At the forefront of such transformations has been the mammal Tree of Life project, a multiyear, multi-investigator initiative funded by the National Science Foundation.
Making a data-rich outline of the lineages of placental mammals, both living and extinct, allows us to interpret how traits evolved over time. With this tree we can tackle intriguing questions, such as: Did Whales change their diet before or after they lost their limbs and evolved away from land to an obligatory aquatic life? Or, closer to the issue of human origins, did rodents exist as early as primates?
Delineating the mammal Tree of Life was an undertaking that required the efforts of large and varied groups of specialists. The mammal phenomic team, in which I participated, included 22 other scientists and doctoral students. The scale of our task was unprecedented: Most previous studies had itemized an order of magnitude fewer traits and had left no online images or data references for scientists and the public. We addressed the project with a new web application called MorphoBank that gives scientists the means to organize, edit, and illustrate published information on traits observed in mammals. Our team compiled old and new observations amounting to over 4,000 distinct phenomic traits from all parts of the animals’ skeleton, dentition, and soft tissues, as well as from their development and behavior.
At the start of the project, team members met in person-two dozen of us from five different countries-to bring together all our legacy publications, which held the names of specified traits and the characteristics of species. The data from legacy publications are often difficult to compare because scientists rarely articulate traits in a uniform way; merging various large spreadsheets, or matrices, becomes a complex problem. Moreover, since many traits had been studied for only a subset of mammal species, we needed to find out about their conditions in other mammal species.
The creation of such a database for phenomics has only recently begun to be supported; these data have been much more of a challenge to organize than the genomic data, as well as more difficult to quantify, document, and share. The cataloging of traits tends to be messier; it lends itself to description and images, in contrast to gene sequences, which are inherently digital (consisting only of A, C, T, and G), and is often accompanied by complex terminology. Replicating and corroborating the work of peers-a core part of scientific work-have posed their own difficulties. Relatively few phenomic studies for phylogeny have been stored in databases, and even fewer have included images that show exactly what is meant by a given trait.
For five years, the mammal phenomic team worked together in a shared online environment where we could conduct virtual debates about anatomical terminology and could exchange labeled images with one another to clarify what we meant. Day by day, each debate we settled and each image we added became a building block of a growing online store of anatomical knowledge. These data were then combined with genetic data to build an enormous, synthetic matrix of matrices-a supermatrix.
In the next step, we ran the data through various algorithms to build a phylogenetic tree. These algorithms use the power of today’s computers to search rapidly among a tremendous number of possible phylogenetic trees and calculate their relative length or probability so that an optimal tree may be chosen. For example, a parsimony algorithm would search for the shortest possible tree, the one that could account for all the observed traits with the fewest possible steps. The parsimony tree, published in our 2013 paper in Science, differed little from those generated using statistical methods. Such a tree constrains our hypotheses about the time of origin of a group, but it can also be compatible with many different scenarios describing ages of different taxonomic groups on that tree. The numerous fossil species on the tree be- come baseline calibrations for how the tree fits onto the stratigraphic record.
Using a concept called ghost lineages, phylogeneticists can state that a group appeared at least by a given time. The reasoning runs along these lines: When two species are each other’s closest relatives, both must have existed at the time they split apart. If the two species are each known from a different range of time in the fossil record, we can infer that they are both at least as old as the older of the two. The concept of ghost lineages thus combines time and tree topology to determine the minimum ages of closely related species.
Ghost lineage analysis leads us to posit that the placental mammal lineage appeared before the major extinction event known as the Cretaceous-Paleogene, or K-Pg, event (formerly called the K-T event). This calamity occurred just over 65 million years ago and coincided with extensive volcanic activity, an asteroid impact, and, most famously, the extinction of the non-avian dinosaurs. What is less well known about the K-Pg event is that its impact reached far beyond the Dinosauria: Approximately 75 percent of species on the planet went extinct. Our hypothesis fits what is called the explosive model of placental mammal evolution, which holds that placental lineages appeared at the end of the Cretaceous Period or afterward, and rapidly diversified.
Some investigators then like to build models of gene evolution or fossil preservation rates, which typically extend dates back much further than the fossil record shows. The test of such models, which describe past events that cannot be repeated, is possible only through further empirical (that is, paleontological) discovery.
Modeling gene changes-also known as building a molecular clockestimates the line of placental mammals to be at least 100 million years old. As for fossil evidence, paleontologists have tested this model by looking extensively for such early traces of placentals, but with no success.
Assembling the Supermatrix
One might ask, Why not use just genes or just phenotypes to build trees? Why use both? The answer is that all the evidence bears on the tree shape; using different evidence, or different subsets of the evidence, could produce different trees. This is why we never want to leave out information when building a phylogenetic tree.
An important concept related to this line of reasoning is the distinction between homology and homoplasy. In conditions of homology, species have similar-looking traits or genes because they inherited the trait from a common ancestor; in homoplasy, they have similar traits but each member of the group evolved the traits on its own, perhaps because they confer a particular adaptive advantage. Both conditions occur broadly in nature; we cannot tell, just on the basis of observation, which traits show homoplasy and which ones show homology. We can discover which traits show true homology only by building phylogenetic trees based on all the data.
A data set of this kind, or supermatrix, looks like a spreadsheet of observations about both fossil and living species, with photographs and line drawings, measurements, and detailed descriptions embedded within taxonomic categories. Improvements in the speed of genome data collection, such as next generation sequencing, and the important investment by the federal government in gene databases through NCBI, have meant a great influx of genomic data for treebuilding. The other major source of information, as already discussedphenomic data-can be particularly powerful because it is the only means by which we can directly compare fossil and living species.
In the case of Mammalia, a considerable number of past species are known from fossils. Since at least the early 20th century, paleontological expeditions have traveled all over the world to find new fossil mammals. Our team includes scientists who have worked on five different continents looking for fossil mammals, particularly in rocks dating from the Late Cretaceous/Early Paleogene, the key time interval where we might expect to find fossils that help shape our understanding of placental mammal evolution.
One of the most successful of the early expeditions set out in 1921 under the leadership of the American Museum of Natural History to explore the Gobi Desert in Mongolia. The fossil mammals recovered from this and later expeditions to these rocks are some of the most complete specimens known to science. Analysis of these and other Mesozoic fossils shows that they consistently fall on the Tree of Life outside of Placentalia, closely related but not actually members of the group.
Placental Group Identity
When we talk about defining a group of species, such as Mammalia, the first thing that comes to mind is to list the traits found in the members of the group. This approach, however, could leave us vulnerable to being tricked by homoplasy, as explained earlier. Contemporary scientific practice avoids this trap by defining groups on the basis of common ancestry instead. For example, we describe Mammalia as being the common ancestor of monotremes (as exemplified by the duckbilled platypus) and placentals (everything from whales to gerbils, primates, and dogs) and all of their descendants. By this definition, Mammalia also includes marsupials, because marsupials are more closely related to placentals than they are to monotremes.
Within the framework of common ancestry, we then work backward to deduce traits that distinguish a group from its next closest relative. This process is called optimization. Placental mammals share a number of features in contrast to their next closest relatives, including certain characteristics of the skull and dentition, as well as a loss of what are called epipubic bones, small bones found in the anterior abdominal wall.
Although the placental Tree of Life is currently ambiguous as to where Placentalia originated, it does reveal interesting information about how some mammals reached their presentday habitats. The group Afrotheria (which includes elephants and aardvarks), originally identified using molecular data, has its oldest fossil members in the New World, both North and South America. South America was an isolated island continent in the Late Cretaceous and Paleocene; hence, ancient members of this group would have had to disperse between North and South America and then to Africa. Authors of molecular clock models have explicitly argued that the fragmentation of the supercontinent Gondwana was a catalyst for the diversification of placentals, hut we proposed that this diversification and the breakup of Gondwana are completely unrelated.
Anatomical arguments that had brewed for decades had to be settled to launch the project. One of these was the tooth homologies of marsupials compared to placentals. Both marsupial and placental mammals have seven teeth, a mixture of premolars and molars, behind the canine tooth. The different shapes of the teeth in extinct species, however, had long invited the question of whether placental and marsupial mammals have the same seven teeth. After weeks of debate, experts in this area agreed that the weight of evidence from developmental literature, as well as amazing fossils that happened to preserve baby teeth, pointed unambiguously to one conclusion: Only six teeth could be compared between adult marsupials and placentals, because the first molar-like tooth in adult marsupials is actually equivalént to a baby tooth in placentals (which is succeeded by an adult tooth as the animal matures). The new consensus meant that more than a century’s worth of nomenclature, affecting thousands of mammal species, had to change.
Another area of controversy had been, what mammal group is the closest relative of Primates? We identified a group named Sundatheria that includes both Dermoptera (flying lemurs) and Scandentia (tree shrews). Some new groups named since the introduction of molecular data into phylogenetics were also upheld, such as Afrotheria and Euarchonta and the placement of Cetacea (whales and dolphins) as close relatives of hippos. The combined data matrix supports a group known as Tethytheria, whose existence implies that manatees and elephants are more closely related to each other than they are to any other placental mammals. The species in this group were first known from fossils discovered on the margins of an ancient seaway called the Tethys Sea, which covered much of the region near the modem Mediterranean.
Mother of All Placental Mammals
Recent work on the origin and radiation of placental mammals is an example of creative new approaches to comparative anatomical work. From such studies has come a reconstruction of the as-yet-undiscovered common ancestor of all placentals, one that is solidly based on scientific evidence.
Assembling disparate information from the data on the placental mammal tree into a reconstruction was a collaboration between scientific artist Carl Buell and our team of researchers. Buell’s challenge was to draw a picture of something he couldn’t see, that no one had ever seen, purely on the basis of the optimization of traits on the tree. Through extensive back-and-forth with scientists, Buell produced a masterful visual synthesis of the hypothetical placental ancestor living in an ancient Paleogene ecosystem.
Along with the external anatomy, many internal features could also be reconstructed. For mammals, the anatomical parts most often found as fossils are the teeth. Placental mammal teeth have evolved into an enormous range of shapes, sizes, and numbers, from dogs who have fewer teeth than many species and a specialized set of meat-shearing teeth, to sheep who have wide molars that can grate vegetation, to some bats that have fangs and low, bulbous, smooth molars. All these mammal groups diverged over time from ancestral tooth shape and tooth numbers.
We reconstructed the ancestral placental mammal’s dentition, which differs greatly from that of descendant groups. The ancestral molars are triangular, incisors are small and pointed, and the canine is the tallest tooth. By contrast, even the earliest bats, carnivores, and hoofed mammals are each identifiable by their transformed dental anatomy. These species appear within less than 10 million years after the K-Pg boundary, and even in this relatively short span of evolutionary time, they have all changed from this ancestral dentition.
Phenomic and genomic data are now being integrated in what are called simultaneous phylogenetic analyses to give us the fullest picture of the history of life on Earth. Powerful new Internet tools enable comparative anatomists to work not as isolated scholars reminiscent of the nineteenth century but as part of virtual collaborative teams exchanging images and hypotheses in real-time Internet workspaces. This approach opens new paths for building the Tree of Life for all species from insects and jellyfish to diatoms, dinosaurs, and plants.
The mammal Tree of Life project is as much about the process of how phylogeneticists can collaborate in the twenty-first century to tackle bigger and more complex problems as it is about any specific findings. Even with the advances brought about by teams working with new software, organizing phenomic data for the Tree of Life remains an enormous task, one that will take input from thousands of researchers and the public. For this reason, and with the aim of encouraging more widespread engagement in the scientific process of discovering where we came from, we configured the mammal Tree of Life project for open access on the MorphoBank website (www.morphobank.org).
New projects launched by the National Science Foundation are exploring techniques such as crowdsourcing to allow the public to assist scientists by scoring traits from images. Other approaches use computer algorithms to extract trait information directly from existing literature. Experts studying mollusks, plants, and other groups are building phylogenetic matrixes across the Tree of Life and using MorphoBank. The next few decades promise to bring much of anatomy into the open in online matrices that deliver scientific knowledge about the evolution of phenotypes to an audience worldwide.