3-D Genome Sequencing

March 6, 2010 · Posted in Tools of the Genome Trade · Comment 

By Hsien-Hsien Lei, PhD, HUGO Matters Editor

 image for id B0006458Congratulations to to Erez Lieberman-Aiden, graduate student at the Harvard-MIT Division of Health Science and Technology, on winning the Lemelson-MIT Student Prize. One of his several impressive innovations is the Hi-C method for three-dimensional genome sequencing which Lieberman-Aiden likens to “MRI for genomes.”

From Medical News Today:

Mapping the Human Genome in 3-D
Lieberman-Aiden’s most recent invention is the "Hi-C" method for three-dimensional genome sequencing. It has been hailed as a revolutionary technology that will enable an entirely new understanding of cell state, genetic regulation and disease. Developed together with postdoctoral student Nynke van Berkum of UMass Medical School, and their advisors Eric Lander and Job Dekker, Hi-C makes it possible to create global, three-dimensional portraits of whole genomes as they fold. Three dimensional genome sequencing is a major advance in solving the mystery of how the human genome – which is two meters and three billion chemical letters long – fits into the tiny nucleus of a cell.

Applied to the human genome, the technology enabled Lieberman-Aiden, van Berkum and their team to make two significant discoveries. First, they found that the genome is organized into separate active and inactive compartments; chromosomes weave in and out of these compartments, turning the individual genes along their length on and off. When they examined this process more closely, they found evidence that the genome adapts into a never-before-seen state called a fractal globule. This allows cells to pack DNA extraordinarily tightly without knotting, and to easily access genes when the information they contain is needed.

Paper: Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome, Lieberman-Aiden E, et al., Science, 326:5950, 289-293, 9 October 2009

Image: Spirals of DNA molecules, Annie Cavanagh, Wellcome Images

Whole Genome Sequencing for Cancer

March 5, 2010 · Posted in Genetics of Disease, Tools of the Genome Trade · 1 Comment 

By Hsien-Hsien Lei, PhD, HUGO Matters Editor

Last month, researchers at the Johns Hopkins Kimmel Cancer Center announced that they had successful sequenced the complete genomes of cancer patients. The sequences were analyzed using a technique called “personalized analysis of rearranged ends” or PARE. PARE can detect genome rearrangements that can then be used as cancer biomarkers indicating tumor growth. Cancer genomes can be used to identify:

  • Driver mutations that cause cancerous growth through mechanisms such as the alteration of gene expression
  • Mutations that are the same in different tumors of the same type
  • New drug targets based on the mutations identified in the cancer genome
  • Diagnostic tools based on a complete list of driver mutations in each cancer type
  • Effective drugs or a cocktail of drugs tailored to each individual based on their tumor profile of driver mutations

    (Source: The Scientist)

The results from whole genome sequencing of cancer patients can be used to “monitor the growth of tumors, determine appropriate levels of therapy, and show instances of recurrence.” (BioTechniques)

“Eventually, we believe this type of approach could be used to detect recurrent cancers before they are found by conventional imaging methods, like CT scans,” Luis Diaz, assistant professor of oncology at Johns Hopkins, said in a press release.

The Cancer Genome Atlas (TCGA), part of the National Human Genome Research Institute, is also working on using large-scale genome sequencing to study cancer. In June 2009, their Genome Sequencing Centers began including whole exome and whole genome data. And in July 2009, the Genome Sequencing Centers completed the first of 24 whole genome sequence analyses of glioblastoma multiforme and ovarian tumor samples. Here’s Dr. Raju Kucherlapati, Principal Investigator, Genome Characterization Center, The Cancer Genome Atlas, speaking about cancer genetics and genomics.


Recently, Amy Harmon of the New York Times explored targeted cancer therapies in a three-part series. She profiled Dr. Keith Flaherty who was in charge of clinical trials testing PLX4032 in melanoma patients.

Healthy cells turned cancerous, biologists knew, when certain genes that control their growth were mutated, either by random accidents or exposure to toxins like tobacco smoke and ultraviolet light. Once altered, like an accelerator stuck to the floor, they constantly signaled cells to grow.

What mattered in terms of treatment was therefore not only where a tumor originated, like the lungs or the colon, but also which set of these “driver” genes was fueling its growth. Drugs that blocked the proteins that carried the genes’ signals, some believed, could defuse a cancer without serious side effects.

Targeted cancer therapies also include gene therapy although this approach has thus far been unsuccessful. More information on targeted cancer therapies is available at the National Cancer Institute (link).

Petascale Computing and Genomics

March 1, 2010 · Posted in Research, Tools of the Genome Trade · Comment 

By Hsien-Hsien Lei, PhD, HUGO Matters Editor

Last week, I mentioned the use of petascale supercomputers to manage and analyze the overwhelming amount of genomic data being generated currently and into the foreseeable future. Last week was also the first time I’d ever heard the terms “petascale” and “petaflop.” I assume that I’m not the only one who hasn’t given much thought to the specifics of supercomputing so I’m sharing here what I’ve learned so far.

First, a couple of definitions:

  • “peta”  is one quadrillion (1015)
  • FLOPS stands for FLoating point OPeration which is a measure of a computer’s performance
  • 1 petaflop is equal to 1000 teraflops or 1 quadrillion floating point operations per second


  • According to Wikipedia, a simple calculator functions at about 10 FLOPS.
  • Most personal computers process a few hundred thousand calculations per second.
  • One petabyte of data is equivalent to six billion digital photos. (Blue Waters)
  • Google processes 20 petabytes of data per day (GenomeWeb)
  • 1 petabyte = 1,024 terabytes

image Image: I See Your Petaflop and Raise You 19 More, Wired Science, February 2, 2009. Sequoia is the supercomputer planned by the Department of Energy and IBM that will be able to perform at the 20 petaflop level.

David A. Bader, author of Petascale Computing: Algorithms and Applications explained in an interview with iTnews.com.au,

Computational science enables us to investigate phenomena where economics or constraints preclude experimentation, evaluate complex models and manage massive data volumes, model processes across interdisciplinary boundaries, and transform business and engineering practices.

Petascale computing is run off clusters of computers. An article in Cloud Computing Journal explains why:

The main benefits of clusters are affordability, flexibility, availability, high-performance and scalability. A cluster uses the aggregated power of compute server nodes to form a high-performance solution for parallel applications. When more compute power is needed, it can be simply achieved by adding more server nodes to the cluster.

In November 2009, it was announced that a four-year $1 million project, supported by the National Science Foundation’s PetaApps program, was awarded to study genomic evolution using petascale computers. Researchers will first use GRAPPA, an open-source algorithm, to study genome rearrangements in Drosophila. From this analysis, new algorithms will be developed which have the potential to make sense of genome rearrangements leading to better identification of microorganisms, the development of new vaccines, and a greater understanding of how microbial communities evolve along with biochemical pathways.

In 2011, the world’s most powerful supercomputer, Blue Waters, will come online. According to GenomeWeb, Blue Waters will contain more than 200,000 processing cores and can perform at multi-petaflop levels. A partnership between University of Illinois at Urbana-Champaign, its National Center for Supercomputing Applications, IBM, and the Great Lakes Consortium for Petascale Computation, Blue Waters is supported by the National Science Foundation and the University of Illinois. Researchers can apply for time on Blue Waters from the National Science Foundation.

"I think petascale computing comes at a very good time for biology, especially genomics, which has to deal with … increasingly large data sets trying to do a lot of correlation between the data that’s held in several massive datasets," says Thomas Dunning, director of the NCSA at University of Illinois, Urbana-Champaign. "This is the time that biology is now going to need this kind of computing capability — and the good thing is that it’s going to be here."

~Petascale Coming Down the Pike, GenomeWeb, Jun 2009

Here’s a video of Saurabh Sinha, a University of Illinois assistant professor of computer science, talking about his research using NCSA’s supercomputers.

Genome-wide search for regulatory sequences in a newly sequenced genome: comparative genomics in the large divergence regime

Next topic for thought: cloud computing. More to come.

NB: HUGO President Prof. Edison T. Liu is currently attending the Bioinformatics of Genome Validation and Supercomputer Applications workshop at NCSA in Urbana, Illinois. I’m looking forward to hearing more about their discussions!

Do you have any knowledge to share with regards to petascale computing and genomics?

Drinking from the Fire Hose of Genomic Data

February 26, 2010 · Posted in Tools of the Genome Trade · Comment 

By Hsien-Hsien Lei, PhD, HUGO Matters Editor

At a meeting with HUGO President Prof. Edison Liu on Monday, we talked about the tremendous opportunities for utilizing personal genome data if we weren’t limited by the lack of computing power. Just last month, the DOE Joint Genome Institute held an invitation-only workshop to discuss the use of high performance computing for analyzing and managing data from genome sequencing. As sequencing becomes more efficient and cost-effective, we are reaching the next hurdle of data management, data analysis, and translational research.

image One laboratory that is interested in using higher-performance computing (HPC) for genomics is the Zhulin lab at the National Institute for Computational sciences, Oak Ridge National Laboratory, University Tennessee.

The exponential growth of genomic datasets leads to a situation, where computing power becomes a critical issue. We invest in adapting powerful bioinformatics tools for use with HPC architectures. The UT-ORNL Kraken and ORNL Jaguar petascale supercomputers offer tens of thousands of processors that enable researchers to computationally analyze data on an unforeseen scale. Our first successful implementation of the HMMER software (now scalable to thousands of processors) permitted us to match every sequence in the NCBI non-redundant database (roughly 5 million protein sequences) to all Pfam domain models in less than a day.

Supercomputers used for petascale computing can perform a quadrillion (1015) calculations per second. Singapore is set to have its own cluster of supercomputers in a joint R&D partnership between Fujitsu and the Agency for Science Technology and Research.

Personal genomics company, Knome, announced the launch of KnomeDISCOVERY in November 2009. Targeting research groups, KnomeDISCOVERY will help researchers sequence DNA, manage data, and perform preliminary analyses. The service will also help clinical researchers identify novel alleles that are associated with specific diseases of interest.

Researchers with expertise in medical genomics who want to streamline data management and preliminary analysis in forthcoming mass sequencing projects - Leveraging high-volume access to sequencing platforms, Knome handles the logistical hurdles of rapid-turnaround sequencing, and carries out the important but computationally intensive process of "background" genome analysis, freeing researchers to focus on specific question-driven hypothesis testing that can yield novel discoveries in genetic medicine.

Clinically trained researchers with extensive expertise in specific diseases, for whom mass sequencing approaches are novel and unfamiliar tools – Knome’s expertise in analyzing whole genome data can directly help these researchers pinpoint novel alleles that contribute to a disease of interest. Knome takes a "fine-toothed" approach to genomic data analysis, grounded in a thorough understanding of genome structure and function; protein biochemistry; population/evolutionary genetics; statistical analysis; and basic disease etiology, as refined by close consultation with the researcher. This approach can quickly identify potentially disease-relevant candidate alleles for researchers to consider for follow-up empirical assessment.

Now that we’re getting a handle on the technicalities of sequencing, it’s time to grapple with the challenge posed by the massive volumes of data that are being produced. Translational genomics research will enable us to better understand the biology of living organisms and holds the key to better diagnosis, treatment, and cures of the diseases that ail us.

Social Networking for Scientists

October 28, 2009 · Posted in Tools of the Genome Trade · Comment 

HUGO is in good company when it comes to using social media to further scientific aims. The NIH has awarded a two-year, $12.2 million grant to the University of Florida to establish a social network for biomedical scientists called VIVOweb. The backbone of the social network will be VIVO, a system developed at Cornell in 2003 that was intended to connect scientists with each other based on their research areas of interest.

A number of institutions are participating in the VIVOweb project:

  • Cornell – multi-institutional functionality
  • University of Florida – keeping site’s data current
  • Indiana University Bloomington – social network tools
  • Scripps Research Institute – implementation site
  • Ponce School of Medicine – implementation site
  • Washington University of St. Louise – implementation site
  • Weill Cornell Medical College – implementation site

One of the best established social networks available to scientists is Nature Network. Available without charge, Nature Network is intended for scientists to:

  • Keep in touch with colleagues and make new contacts both globally and via local networks, such as Boston, New York, and London
  • Discuss research and scientific issues
  • Provide a platform for blogging and forums

OpenWetWare at MIT is another social media tool for biology and biological engineering that is similar to a wiki, user-generated encyclopedic content on research materials and protocols. Its aims are to “promote the sharing of information, know-how, and wisdom among researchers and groups.”

Is social networking a part of your scientific life? 

Source: EurekAlert!