Using next-generation gene sequencing to explore the depths of the genome
BY DARRELL E. WARD
Leukemia research reached a milestone in 1994 with the discovery of the MLL PTD gene mutation in patients with cytogenetically normal acute myeloid leukemia (CN-AML). Presence of the mutation in leukemic cells identifies a subtype of CN-AML that responds poorly to chemotherapy alone but well to chemotherapy plus autologous stem cell transplantation.
The mutation became the first prognostic marker in this large group of AML patients who lack the chromosomal damage that otherwise would be used to guide treatment decisions and predict relapse risk.
The mutation was discovered by a team of investigators that included Michael A. Caligiuri, MD, Clara D. Bloomfield, MD, and Carlo M. Croce, MD, who are all now at The Ohio State University Comprehensive Cancer Center – Arthur G. James Cancer Hospital and Richard J. Solove Research Institute (OSUCCC – James).
To investigate the mutation’s contribution to leukemia development, the researchers developed an Mll-PTD mouse model. They then crossed that mouse with a strain that had a mutation in a gene called Flt3, another mutation subsequently discovered in CN-AML patients by a different research group.
This strain of mice with both mutations developed leukemia usually 50 weeks after birth, a time considered to be “older-aged” for mice. The question was, why did it take so long?
The researchers hypothesized that DNA methylation changes were gradually silencing tumor-suppressor genes. Recently, they began designing experiments to look for differences in DNA methylation patterns in younger and older-aged mice with the Mll PTD only, with Flt3 ITD only and with the two mutations together.
“We want to elucidate the contribution of DNA methylation to leukemogenesis in our novel AML mouse model,” says Susan Whitman, PhD, an OSUCCC – James research scientist in the laboratory of Caligiuri, who is director of the OSUCCC and CEO of The James Cancer Hospital and Solove Research Institute.
They chose to do the study using next-generation sequencing (NGS). Compared with other well-tested single-gene approaches, “next-generation sequencing is less expensive on a per-gene basis, and it allows us to look at methylation changes throughout the genome,” Whitman says.
“With NGS, we can ask questions we could never ask before,” says Hansjuerg Alder, PhD, director of the OSUCCC – James Nucleic Acid Shared Resource, which offers gene sequencing using the Illumina platform. “It makes it far easier to discover potential biomarkers for predicting disease prognosis, progression and drug response, and for molecular diagnosis.
“Each individual with cancer has different genetic changes, and this technology can identify those differences,” Alder says. “NGS will enable us to better classify cancers and to tell which patients might respond to a therapy and which probably won’t. All of this will help make personalized medicine for cancer possible.”
Ohio State is committed to personalized medicine, says Clay Marsh, MD, senior associate vice president for Health Sciences Research and executive director of the Center for Personalized Health Care. “Health care at Ohio State will utilize gene-based information to understand each person’s individual requirements for the maintenance of his or her health and prevention of disease, with therapy tailored to that individual’s genetic uniqueness. Ideally, it also includes incorporating knowledge of their environment, health-related behaviors, culture and values.”
Alder notes that some leaders in the field believe personalized medicine will eventually involve using a patient’s genome sequence data the way information from blood tests is used today.
“NGS could change medicine in a way comparable to noninvasive imaging,” says Jeff Palatini, PhD, technical director of the OSUCCC Microarray Shared Resource, which offers gene sequencing using the SOLiD platform. “Just as the ability to see the internal organs without doing surgery changed medicine, NGS enables us to identify changes at the molecular level that we can’t see any other way.”
The experience of Alder, Palatini and other researchers at the OSUCCC – James shows that NGS—also called second-generation sequencing, deep sequencing and massively parallel sequencing—is revolutionizing cancer research.
First-generation genome sequencing technology made the Human Genome Project possible. Completing that groundbreaking effort took 13 years and the involvement of 18 countries.Sequencing alone cost $400 million. It revealed for the first time the sequence of the 3 billion base pairs that make up human DNA. NGS, in contrast, can do the same thing and more in 10 days and with greater accuracy for about $10,000.
An NGS investigation begins with amplification of DNA fragments, tagging each base in the fragments with one of four fluorescent colors and reading the sequence of colors, a process carried out by a tabletop-sized machine. This base identification yields the raw sequence data, a string of ‘A’s, ‘C’s, ‘G’s and ‘T’s. Depending on the question being asked, NGS data will be analyzed by bioinformaticians and computational scientists using vastly different approaches.
“NGS is extremely powerful,” says Palatini. “It enables the sequencing of entire genomes, both coding (exons) and noncoding (introns) regions, which was not practical before.”
Or, he says, one can examine just DNA regions that are methylated, also called the epigenome; all RNA transcripts—the transcriptome— or just certain transcripts, such as microRNAs. “We can study the genome, the epigenome and the transcriptome at the same time, and we can get information about the depth and expression of transcripts, as well as their sequence,” Palatini says.
“NGS can reveal whether genes are methylated or mutated or both, and we can look at such questions simultaneously to learn what’s actually happening,” he says. “We can study how cancer cells regress to a more primitive state, or dedifferentiate, which may someday enable us to reverse this process in malignant cells, offering a new treatment for cancer. ”
It can also improve the accuracy of new targeted therapies, he notes. Administering a DNA methylation inhibitor to one cohort and not another could reveal if the agent is acting on the intended target gene and adversely affecting other genes.
Depth equals confidence
Once genomic DNA, RNA or cDNA is fragmented in preparation for sequencing, the fragments are amplified to increase signal intensities. Depending on the size and complexity of the genome—cancer genomes have many more changes than normal genomes, for example—different amounts of sequencing data must be collected to assure the accuracy of the resulting profile. In general, each region of a normal genome should be sequenced at least 30 times for adequate coverage and to establish confidence, says Pearlly Yan, PhD, technical director of the OSUCCC Nucleic Acid Shared Resource and a sequencing specialist.
When only an expanse of DNA is examined, one can achieve very high coverage, often to a depth of tens of thousands of times, thereby allowing researchers to examine rare mutations or difficult-to-amplify regions.
“The human genome has about 3 billion base pairs, so to achieve 30 times coverage, we need to obtain at least 3 billion x 30, or more than 90 billion base pairs,” Yan says. “That’s a lot of data. The amount of information produced by each genome experiment requires different computational approaches to uncover the wealth of biology hidden in it.”
Because the OSUCCC – James is a research institution, its sequencing facilities can offer investigators the latest, most informative sequencing approaches and many options for sample preparation to obtain the needed information at the desired depth, Yan says.
“More depth and diverse approaches are required to accurately detect rare events, small changes, or when sequencing certain areas of the genome such as regions with repetitive sequences,” she says.
Storing and transmitting all this data requires a sophisticated infrastructure. One full sequencing run on the SOLiD platform can produce 9 terabytes (TB) of data, Palatini says. (One TB equals 1,000 gigabytes. For perspective, the Library of Congress had almost 160 TB of data in its collection as of February 2010.)
Then comes data analysis, which cannot be done in the traditional ways on most laboratory PCs. “In some cases, software packages are available that make it easier for biologists to carry out some secondary analyses, but in most cases computing power and bioinformatics collaborators are essential for success,” Yan says.
Complex analyses require close collaboration among wet-lab biologists, biomedical informaticians and computational scientists who can spot subtle variations in sequence data and write algorithms for identifying patterns. Biostatisticians are needed to determine statistical significance.
“The challenge is to find the important details in reams of sequencing data,” says Jeffrey Parvin, MD, PhD, interim chair of the Department of Biomedical Informatics and director of the Biomedical Informatics Shared Resource.
“If you look at the Manhattan skyline for the most important object, your eye might be drawn to the Empire State building. But actually a lot of important things are happening in other buildings that you don’t see when looking at the skyline. Other kinds of analyses are needed to pick up that information.”
The Biomedical Informatics Shared Resource can write computer programs that automate many of the analyses or modify current software to make analysis easier in the future, Parvin says. “This is a la carte work. Every biologist needs something special, and we provide solutions for that special need.”
“The OSUCCC – James is a top-tier center for small RNA sequencing,” Palatini says. “We pioneered much of the microRNA sequencing chemistry and beta testing for the country, as well as the workflow pipeline and computational methods of data analysis.”
Much of this work was driven by the research of OSUCCC – James investigator Carlo M. Croce, MD, professor of Molecular Virology, Immunology and Medical Genetics, and director of the Human Cancer Genetics program.
Croce, who also directs the Microarray Shared Resource, is using NGS to investigate the mechanism of disease in chronic lynphocytic leukemia (CLL). For example, he and his lab are looking at microRNA changes and DNA sequences simultaneously to identify the molecular mechanism involved in progression and the points of therapeutic intervention to prevent indolent CLL from becoming aggressive.
OSUCCC sequencing facilities can build DNA libraries to suit every need and sequencing platform, Palatini says, including methylation libraries, targeted sequencing libraries, fragment libraries, paired-end and mated-pair libraries, small RNA libraries, ChIP-seq libraries and Sure-Select libraries.
Chromatin immunoprecipitation (ChIP) technology allows the location of sites where proteins bind with DNA. When this method is coupled with NGS, called ChIP seq, genomewide investigations of changes in transcription sites and modification to chromatin structure are possible.
Michael Ostrowski, PhD, professor and chair of Molecular and Cellular Biochemistry, and co-leader of the OSUCCC – James Molecular Biology and Cancer Genetics Research Program, and his colleagues study changes in the tumor microenvironment and in cancer cells. They use ChIP seq to study the transcription factor Ets in three cell compartments: stromal fibroblasts, macrophages and endothelial cells.
In addition, they are studying chromatin marks in these three cell compartments during tumor progression. Finally, they are using RNA sequencing technology for gene expression profiling of both mRNA and microRNA in tumor cells and in the same three tumor microenvironment cell compartments.
“We also plan to use NGS to study changes in the tumor-cell genome in response to changes in the microenvironment, including gene loss and amplification,” Ostrowski says.
“This technology is exciting because it allows discovery, which can lead to testable hypotheses,” he explains. “For example, if we find that Ets2 binds to genes involved in specific signaling pathways, we can make testable hypotheses based on that global data.” ChIP seq technology plus top-notch computational modeling enabled OSUCCC – James investigator Tim Huang, PhD, professor of Molecular Virology, Immunology and Medical Genetics, and his collaborators to discover a new form of estrogen-mediated gene silencing that may contribute to breast cancer.
Their study, published in the journal Genome Research, analyzed transcriptome, methylome and estrogen receptor datasets from normal breast epithelia and breast cancer cells. It uncovered a cluster of 14 genes that are simultaneously silenced in breast cancer cells through a mechanism that brings the promoters of these genes together at one regulatory site for coordinated repression. The contortions involved in this process produce DNA loops. These loops are temporary in normal cells but fixed in breast cancer cells, resulting in long-term repression of the 14 genes.
Targeted sequencing focuses on just a region of DNA, such as a stretch of chromosome several megabases long, using customized probes that target that region. The method is useful for comparing chromosome regions in people with and without cancer, for example.
Huiling He, MD, an OSUCCC – James research scientist, is working with the Nucleic Acid Shared Resource to apply the technique for targeted DNA sequencing. Albert de la Chapelle, MD, PhD, professor of Molecular Virology, Immunology and Medical Genetics and the Leonard J. Immke, Jr., and Charlotte L. Immke Chair in Cancer Research, leads the study.
“An advantage of deep sequencing is its ability to detect changes that would be missed by other sequencing methods,” says He. “We know the mutation is present, but conventional sequencing cannot locate it. We believe the improved coverage offered by deep sequencing will reveal areas that were missed earlier.”
NGS technology is progressing rapidly. Third-generation sequencers can generate longer reads without the need to amplify sample fragments. In some cases, the sequencing process can be monitored using an iPhone.
Keeping pace with the technology is an expensive challenge, says Jeff Walker, executive director of the OSUCCC – James. “It’s not only the cost of purchasing and maintaining the instrumentation and supporting facility, but also of the bioinformatics expertise and infrastructure required to interpret the massive amount of data generated.”
To solve the problem, the OSUCCC – James is exploring the formation of a local consortium that brings together academic centers and industry to support a single genomics infrastructure. “This would give our investigators ready access to the newest NGS technology,” Walker says.
“Clearly,” he says, “NGS is a critical tool for developing new approaches to diagnosing and treating cancer, and for personalizing clinical care.”