Drinking from the Fire Hose of Genomic Data
By Hsien-Hsien Lei, PhD, HUGO Matters Editor
At a meeting with HUGO President Prof. Edison Liu on Monday, we talked about the tremendous opportunities for utilizing personal genome data if we weren’t limited by the lack of computing power. Just last month, the DOE Joint Genome Institute held an invitation-only workshop to discuss the use of high performance computing for analyzing and managing data from genome sequencing. As sequencing becomes more efficient and cost-effective, we are reaching the next hurdle of data management, data analysis, and translational research.
One laboratory that is interested in using higher-performance computing (HPC) for genomics is the Zhulin lab at the National Institute for Computational sciences, Oak Ridge National Laboratory, University Tennessee.
The exponential growth of genomic datasets leads to a situation, where computing power becomes a critical issue. We invest in adapting powerful bioinformatics tools for use with HPC architectures. The UT-ORNL Kraken and ORNL Jaguar petascale supercomputers offer tens of thousands of processors that enable researchers to computationally analyze data on an unforeseen scale. Our first successful implementation of the HMMER software (now scalable to thousands of processors) permitted us to match every sequence in the NCBI non-redundant database (roughly 5 million protein sequences) to all Pfam domain models in less than a day.
Supercomputers used for petascale computing can perform a quadrillion (1015) calculations per second. Singapore is set to have its own cluster of supercomputers in a joint R&D partnership between Fujitsu and the Agency for Science Technology and Research.
Personal genomics company, Knome, announced the launch of KnomeDISCOVERY in November 2009. Targeting research groups, KnomeDISCOVERY will help researchers sequence DNA, manage data, and perform preliminary analyses. The service will also help clinical researchers identify novel alleles that are associated with specific diseases of interest.
Researchers with expertise in medical genomics who want to streamline data management and preliminary analysis in forthcoming mass sequencing projects - Leveraging high-volume access to sequencing platforms, Knome handles the logistical hurdles of rapid-turnaround sequencing, and carries out the important but computationally intensive process of "background" genome analysis, freeing researchers to focus on specific question-driven hypothesis testing that can yield novel discoveries in genetic medicine.
Clinically trained researchers with extensive expertise in specific diseases, for whom mass sequencing approaches are novel and unfamiliar tools – Knome’s expertise in analyzing whole genome data can directly help these researchers pinpoint novel alleles that contribute to a disease of interest. Knome takes a "fine-toothed" approach to genomic data analysis, grounded in a thorough understanding of genome structure and function; protein biochemistry; population/evolutionary genetics; statistical analysis; and basic disease etiology, as refined by close consultation with the researcher. This approach can quickly identify potentially disease-relevant candidate alleles for researchers to consider for follow-up empirical assessment.
Now that we’re getting a handle on the technicalities of sequencing, it’s time to grapple with the challenge posed by the massive volumes of data that are being produced. Translational genomics research will enable us to better understand the biology of living organisms and holds the key to better diagnosis, treatment, and cures of the diseases that ail us.
Comments
Leave a Reply


