These days, phylogeneticists – experts who painstakingly map the complex branches of the tree of life – suffer from an embarrassment of riches. The genomics revolution has given them mountains of DNA data that they can sift through to reconstruct the evolutionary history that connects all living beings. But the unprecedented quantity has also caused a serious problem: The trees produced by a number of well-supported studies have come to contradictory conclusions.
“It has become common for top-notch studies to report genealogies that strongly contradict each other in where certain organisms sprung from, such as the place of sponges on the animal tree or of snails on the tree of mollusks,” said Antonis Rokas, associate professor of biological sciences, at Vanderbilt University.
In a study published online May 8 by the journal Nature, Rokas and graduate student Leonidas Salichos analyze the reasons for these differences and propose a suite of novel techniques that can resolve the contradictions and provide greater accuracy in deciphering the deep branches of life’s tree.
“The study by Salichos and Rokas comes at a critical time when scientists are grappling with how best to detect the signature of evolutionary history from a deluge of genetic data. These authors provide intriguing insights into our standard analytical toolbox, and suggest it may be time to abandon some of our most trusted tools when it comes to the analysis of big data sets. This significant work will certainly challenge the community of evolutionary biologists to rethink how best to reconstruct phylogeny,” said Michael F. Whiting, program director of systematics and biodiversity science at the National Science Foundation, which funded the study.
To gain insight into this paradox, Salichos assembled and analyzed more than 1,000 genes – approximately 20 percent of the entire yeast genome – from each of 23 yeast species. He quickly realized that the histories of the 1,000-plus genes were all slightly different from each other as well as different from the genealogy constructed from a simultaneous analysis of all the genes.
“I was quite surprised by this result,” Salichos pointed out.
By adapting an algorithm from information theory, the researchers found that they could use these distinct gene genealogies to quantify the conflict and focus on those parts of the tree that are problematic.
In broad terms, Rokas and Salichos found that genetic data is less reliable during periods of rapid radiation, when new species were formed rapidly. A case in point is the Cambrian explosion, the sudden appearance about 540 million years ago of a remarkable diversity of animal species, without apparent predecessors. Before about 580 million years ago, most organisms were very simple, consisting of single cells occasionally organized into colonies.
“A lot of the debate on the differences in the trees has been between studies concerning the ‘bushy’ branches that took place in these ‘radiations’,” Rokas said.
The researchers also found that the further back in time they went the less reliable the genetic data becomes. “Radioactive dating methods are only accurate over a certain time span,” said Rokas. “We think that the value of DNA data might have a similar limit, posing considerable challenges to existing algorithms to resolve radiations that took place in deep time.”
The research was supported by National Science Foundation CAREER award DEB-0844968.