SCIENTISTS HAVE LONG recognized that identifying sequences of shared DNA among different species is their best bet for distinguishing potential evolutionary relationships. And indeed, studies in animals have already yielded important insights. But progress for plants has been slow.
The reason involves the relative sizes of the genetic data sets. Plant genomes are almost always larger and more complex than those of animals. For example, the genome of one of nature’s least-complicated plants, Arabidopsis thaliana, has just under 25,000 genes. We humans, by comparison, have 23,000.
Sequencing more-complicated flora, like maize, can yield a truly staggering amount of genetic data. As reported in the November 2009 edition of Science, maize’s 2.3-billion-base sequence — at the time the largest genetic blueprint yet worked out for any plant species — included more than 32,000 protein-coding genes spread across 10 chromosomes.
Because of such daunting numbers, past efforts to locate identical bits of DNA in multiple plant species have mostly not been successful — at least not on a scale that is very useful. Happily, this is changing, thanks in part to recent work by two MU researchers.
Dmitry Korkin, an MU assistant professor of computer science, is a computational biology and structural bioinformatics expert. Korkin used an algorithm designed to tap the computing power of 48 powerful processors to scan the genomes of six animals and six plant species. He then teamed up with Gavin Conant, an assistant professor of bioinformatics at MU, to look for identical DNA sequences shared among the various genomes.
They found several, a development that was particularly exciting in the plant group. “Our algorithm found identical sequences of DNA located at completely different places on multiple plant genomes,” Korkin says. “No one has ever been able to do that before on such a scale.”
Comparing all the genetic sequences took four weeks, with the processors doing 1 million searches per hour. The total came to some 32 billion searches. Although the scientists found identical sequences among plant species, just as they did with animals, their data suggested the sequences evolved differently. “You would expect to see convergent evolution, but we don't,” says Conant. “Plants and animals are both complex multi-cellular organisms that have to deal with many of the same environmental conditions — like taking in air and water, and dealing with weather variations. But their genomes code for solutions to these challenges in different ways.”
The project, the researchers say, could create a foundation for discoveries to improve the wellbeing of both plants and people. They add that the code-analyzing computer program could itself help in the development of new medicines. “The same algorithm can be used to find identical sequential patterns in an organism’s entire set of proteins,” said Korkin. ”That could potentially lead to finding new targets for existing drugs or studying these drugs’ side effects.”
The study was published in the Proceedings of the National Academy of Sciences. The algorithm was developed by Jeff Reneker, a senior research informatician at MU’s Center for Computational Biology and Medicine, during his MU doctoral study with Chi-Ren Shyu, director of the MU Informatics Institute.