Credits: Darryl Leja (NHGRI), Ian Dunham (EBI)
ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing RNA from a diverse range of sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and chromatin immunoprecipitation (ChIP) of proteins that interact with DNA, including modified histones and transcription factors, followed by sequencing (ChIP-Seq). (description from University of California at Santa Cruz)
ENCODE identifies “Genetic Switches” in DNA
The ENCODE (Encyclopedia of DNA Elements) consortium, made up of 442 scientists in 32 laboratories around the world, published its first set of findings yesterday, most notably the existence of “genetic switches” that affect genes that control which genes are used in a cell and when they are used – whether it’s a gene that causes a disease or a human trait such as height.
The hundreds of researchers working on the ENCODE project have revealed that much of what has been called ‘junk DNA’ in the human genome is actually a massive control panel with millions of switches regulating the activity of our genes. Without these switches, genes would not work – and mutations in these regions might lead to human disease. The new information delivered by ENCODE is so comprehensive and complex that it has given rise to a new publishing model in which electronic documents and datasets are interconnected.
Just as the Human Genome Project revolutionised biomedical research, ENCODE will drive new understanding and open new avenues for biomedical science. Led by the National Genome Research Institute (NHGRI) in the US and the EMBL-European Bioinformatics Institute (EMBL-EBI) in the UK, ENCODE now presents a detailed map of genome function that identifies 4 million gene ‘switches’. This essential reference will help researchers pinpoint very specific areas of research for human disease. The findings are published in 30 connected, open-access papers appearing in three science journals: Nature, Genome Biology and Genome Research.
“Our genome is simply alive with switches: millions of places that determine whether a gene is switched on or off,” says Ewan Birney of EMBL-EBI, lead analysis coordinator for ENCODE. “The Human Genome Project showed that only 2% of the genome contains genes, the instructions to make proteins. With ENCODE, we can see that around 80% of the genome is actively doing something. We found that a much bigger part of the genome – a surprising amount, in fact – is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks.”
“ENCODE data can be used by any disease researcher, whatever pathology they may be interested in,” said Ian Dunham of EMBL-EBI, who played a key role in coordinating the analysis. “In many cases you may have a good idea of which genes are involved in your disease, but you might not know which switches are involved. Sometimes these switches are very surprising, because their location might seem more logically connected to a completely different disease. ENCODE gives us a set of very valuable leads to follow to discover key mechanisms at play in health and disease. Those can be exploited to create entirely new medicines, or to repurpose existing treatments.”
“ENCODE gives us the knowledge we need to look beyond the linear structure of the genome to how the whole network is connected,” commented Dr Michael Snyder, professor and chair at Stanford University and a principal investigator on ENCODE. “We are beginning to understand the information generated in genome-wide association studies – not just where certain genes are located, but which sequences control them. Because of the complex, three-dimensional shape of our genome, those controls are sometimes far from the gene they regulate and looping around to make contact. Were it not for ENCODE, we might never have looked in those regions. This is a major step toward understanding the wiring diagram of a human being. ENCODE helps us look deeply into the regulatory circuit that tells us how all of the parts come together to make a complex being.”
Until recently, generating and storing large volumes of data has been a challenge in biomedical research. Now, with the falling cost and rising productivity of genome sequencing, the focus has shifted to analysis – making sense of the data produced in genome-wide association studies. ENCODE partners have been working systematically through the human genome, using the same computational and wet-lab methods and reagents in laboratories distributed throughout the world.
To give some sense of the scale of the project: ENCODE combined the efforts of 442 scientists in 32 labs in the UK, US, Spain, Singapore and Japan. They generated and analysed over 15 terabytes (15 trillion bytes) of raw data – all of which is now publicly available. The study used around 300 years’ worth of computer time studying 147 tissue types to determine what turns specific genes on and off, and how that ‘switch’ differs between cell types.
The articles published on 6 September represent hundreds of pages of research. But the digital publishing group at Nature recognises that ‘pages’ are a thing of the past. All of the published ENCODE content, in all three journals, is connected digitally through topical ‘threads’, so that readers can follow their area of interest between papers and all the way down to the original data.
“Getting the best people with the best expertise together is what this is all about,” said Ewan Birney. “ENCODE has really shown that leading life scientists are very good at collaborating closely on a large scale to produce excellent foundational resources that the whole community can use.”
“Until now, everyone’s been generating and publishing this data piecemeal and unintentionally trapping it in niche communities and static publications. How could anyone outside that community exploit that knowledge if they don’t know it’s there?” commented Roderic Guigo of the Centre de Regulació Genómica (CRG) in Barcelona, Spain. “We have now an interactive encyclopaedia that everyone can refer to, and that will make a huge difference.” *
What does this mean for research into Huntington’s disease? Dr. Jang-Ho Cha explains:
One Person’s Junk is Another’s Gold
The recent findings from the ENCODE project represent a significant advance in human biology and especially for Huntington’s disease. The Human Genome Project had already sequenced the genome, all of the DNA sequences that carry the instructions for life. The Human Genome Project can be likened to early explorers, making a crude map of a newly discovered continent. Prior to the sequencing of the human genome, people guessed that there may be as many as 100,000 genes encoded in human DNA. Thus, it was surprising that the number of genes was relatively small, only 30,000 or so, far fewer than in other ‘simpler’ organisms. How could this be? Also, what was all that other, non-gene, DNA doing? Previously, this DNA had been called ‘junk’ DNA. The ENCODE findings demolishes this wrong-headed nomenclature, showing how important junk can be.
OK, let’s use an analogy. The genome is like the master score of a symphony. In this way, these instructions are absolutely critical to the functioning of the human body. But, just looking at the notes on a page does not tell you what that that symphony sounds like, especially if you don’t understand what instruments are. Here is the beauty: much of the ‘junk’ DNA are like the instruments that read the score. Now, we get an idea of how nature uses a blueprint of instructions to play a finely coordinated orchestrated symphony. A huge amount of human DNA is dedicated to controlling how much and when the blueprints are read. These ‘switches’ are really the conductors.
HD researchers, including HDSA Coalition for the Cure scientists, have suspected that something was going on in this regard. Since 1998, there was good evidence that mutant huntingtin was altering gene transcription: the musical score was not being played in the correct way (ref 1). Many genes were altered, and somehow, this was all due to the action of mutant versions of huntingtin. But how could one protein—huntingtin—affect the functioning of so many genes? One idea that emerged was that huntingtin was affecting the overall control of transcription (2). In fact, one of the HDSA Coalition for the Cure teams was dedicated to looking at how mutant huntingtin interfered with transcription. One of the interesting findings coming out of the ENCODE papers is not just the sequence of DNA, but the conformation—that is, how the DNA strands are arranged—is an important feature of how human DNA works. Again, HD researchers have found that mutant huntingtin protein can bind to DNA and affect how tightly it is wound (3). The ENCODE researchers also show that it is not the DNA sequence itself but also the protein that ‘read’ the DNA, just as a symphony does not play itself without instruments. Again, HD researchers have had an inkling of this. We recently showed that an important DNA switch protein is altered throughout the genome in HD mice and cells, confirming that one of the things the bad huntingtin protein does is to alter the whole DNA landscape (4). It is only by understanding all the bad effects of huntingtin that we will ever be able to design therapies to block them. So, we’ve known for a while that huntingtin alters transcription in an important way, and the recent ENCODE findings demonstrate how important ‘junk’ can be.
— Jang-Ho Cha, MD PhD
*Source of the overview of the ENCODE project: Press release from the of the European Molecular Biology Laboratory of the European Bioinformatics Institute