A gene needs to express itself in order to contribute to cellular functions. A gene becomes expressed when its genetic information from DNA is transcribed into an RNA molecule. The resulting RNA molecules is not naked in the cells; as soon as an RNA is transcribed, it becomes coated by RNA-binding proteins to form ribonucleoprotein complexes (RNPs). The RNPs oten involve dozens, if not hundreds of proteins bound to an RNA molecule.
Examples of abundant RNPs in our cells include the molecular machines that perform crucial steps in gene expression. Spliceosome does the job of splicing the exons from the nascent RNAs together to produce an mRNA, and ribosomes translate mRNAs into proteins. Splicing and translation are just two of the many stages that most RNA molecules pass through during their 'life cycle' - other stages involve 5' capping, 3' end processing and polyadenylation, editing, methylation, quality control, transport and decay, among others. Our research focuses on 'regulatory RNPs', which guide the RNA through these stages.
To understand how regulatory RNPs work, we develop techniques that integrate biochemistry and computational biology to obtain a comprehensive map of protein-RNA interactions within our cells. We developed the individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP), and a related method called hiCLIP, which identify protein-RNA and RNA-RNA interactions with nucleotide resolution across the transcriptome.
We use these methods in collaboration with the group of Nicholas Luscombe to study how the sequence and structure of RNAs defines the assembly of RNPs, and how the regulatory RNPs in turn guide the life cycle of each RNA.
Cells can change their gene expression by modulating the function of RNPs. Moreover, genetic studies have identified mutations that disrupt the normal function of RNPs. These mutations often cause neurologic diseases, particularly the motor neuron disease, also referred to as amyotrophic lateral sclerosis (ALS).
We study this disease in collaboration with the group of Rickie Patani by using induced pluripotent stem cells with specific genetic mutations, and differentiating them into motor neurons. We wish to understand how these mutations affect the assembly of protein-RNA complexes, thereby initiating the molecular cascade leading to disease. We study the following questions:
1) How do the RNA-RNA, protein-RNA and protein-protein interactions act together to define the assembly and function of regulatory RNPs?
2) How do transposable elements and other non-canonical regulatory sites affect RNP assembly, and thereby contribute to the evolution of new exons and RNA regulatory circuits?
3) How do RNPs guide the differentiation and functions of neurons or glial cells during brain development, aging or neurodegenerative diseases?
4) How do mutations cause disease by disrupting the function of RNPs or RNA regulatory elements, and what treatments could ameliorate this?
And here are some of the RNA stories that we have passed through:
The assembly of RNPs.
Techniques to identify the protein-RNA binding sites.We developed individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) to quantify protein-RNA interactions in the whole transcriptome. We described the details of the iCLIP method and presented it in a video. We showed that the cDNAs in iCLIP truncate at crosslink sites, and developed computational methods based on cDNAs-starts for analysis of iCLIP data. We wrote three reviews on the advances in CLIP technologies: the 2012 and 2018 review describing the core steps of CLIP, and its variant experimental methods, and the 2018 computational review describing the approaches for data analysis and quality control.
We established a question-answer forum to discuss the details and reasons for each step of iCLIP, and other variant methods.
Julian König, Tomaz Curk (iCount), Yoichiro Sugimoto, Ina Huppertz, Nejc Haberman, Chris Sibley, Flora Lee, Anob Chakrabarti
Software for analysis of iCLIP data
Many computational tools have been developed for analysis of data produced by CLIP and its variant protocols, as described in our review. In collaboration with Tomaz Curk, we developed iCount, a Python code and associated command-line interface (CLI), which are available from GitHub, whichs map, annotates and analyses iCLIP data, and defines the peaks of high-occupancy protein-RNA binding sites. This code is the basis for the web server iMaps, which enables a streamlined analysis of data produced by iCLIP and its variant methods, and is available for general use. Published iCLIP data
Our published iCLIP sequencing data are available as raw format (fastq file) from public databases (Arrayexpress and/or GEO, as described in each published manuscript), and also in processed format on the public server iCount). Research Highlights
RNA Analysis Unearths Invaluable Insights
Techniques to identify RNA-RNA contacts, and the RNA secondary structures bound by RBPs.The secondary structure of mRNAs has important effects on its stability and translation. To understand the in vivo structure of full-length mRNAs, developed a technique called hiCLIP to identify the connections that hook sections of an mRNA together, which are called RNA duplexes. We were amazed to find that mRNAs form thousands of such duplexes, and often these duplexes hook together very distant parts of mRNA molecules. We found that that these duplexes interact with the double-stranded RNA binding protein Staufen 1. We also found that these RNA duplexes have less genetic variation in humans than surrounding areas of the mRNA, indicating that mutations could cause disease by disrupting the structure of mRNAs.
Yoichiro Sugimoto, Christina Militti, Flora Lee
Structure of genetic messenger molecules reveals key role in diseases
hiCLIP: New method finds structures of mRNA molecules
Detailed probing of RNA structure in vivo
The function of RNPs.
RNA maps: how does the location of RNA binding site instruct its function?We integrate transcriptomic data on protein-RNA interactions and their function, which can tell us how ribonucleoproteins (protein-RNA complexes) assemble at specific positions on their target transcripts and thereby regulate alternative splicing, mRNA decay or translation. We use iCLIP to assess where an RBP binds its target transcripts, while also assessing how this RBP controls pre-mRNA processing. Integration of these two approaches showed that most RBPs regulate alternative splicing according to genome-wide positional principles, or RNA splicing maps. For example, by integrating TIA iCLIP with its splicing analysis upon TIA knockdown, we were able to derive nucleotide-resolution RNA splicing maps of TIA proteins. We developed the software (RNA motifs) that can derive RNA splicing maps by analysis of multivalent RNA motifs that are often bound by RBPs, and the web platform RNAexpress that can integrate diverse data and perform motif analyses to derive RNA maps for regulation of alternative polyadenylation and splicing.
We also collaborate with the group of Richard Jenner in studies of RBPs that bind to nascent RNA to regulate transcription and chromatin.
Gregor Rot, Zhen Wang, Matteo Cereda, Melis Kayikci, Julian König, Kathi Zarnack, Nejc Haberman, Jan Attig
RNA map gives first comprehensive understanding of alternative splicing
The web platform RNAexpress enables analysis of 3' mRNA-Seq (produced by the Lexogen QuantSeq or other methods) to identify regulated polyA sites. This can be integrated with information on alternative exons and/or RNA binding sites determined by CLIP or motif analyses to derive RNA maps for regulation of alternative polyadenylation and splicing.
The evolution of RNPs.Studies that mapped the binding sites of RNA-binding proteins across the transcriptome have uncovered an unprecedented diversity of previously unknown non-canonical sites for RNA processing, often located far from annotated exons. These studies identified many cryptic events located far from the currently annotated exons and unconventional splicing mechanisms that have important roles in regulating gene expression. These non-canonical splicing events are also a major source of newly emerging transcripts during evolution, especially when they involve sequences derived from transposable elements. We study RBPs that are specialised for binding to these elements, which ensures their tight regulation and quality control. While mutations perturbing binding of RBPs to these elements can disrupt gene expression and lead to diseases, we found that they are also a major driving force for the emergence of new exons during evolution. Image on the right is from the cover of the journal.
Alu-derived exonsBy identifying RNA binding sites of hnRNP C across the transcriptome with iCLIP, we have shown that hnRNP C binds to long uridine tracts, and thereby regulates splicing of alternative exons.
We have also found that hnRNP C can strongly repress inclusion of exons that are derived from Alu elements, or so-called Alu-exons. Alu retrotransposable elements are specific for primate genomes, and they have probably played an important role in the evolution of primates, since they constitute 10% of the human genome. hnRNP C represses recognition of cryptic splice sites in Alu elements by displacing the splicing factor U2AF65 from the uridine tracts. Loss of hnRNP C leads to formation of thousands of harmful exons, and mutations within uridine tracts in Alu elements can cause many human diseases.
We have shown that the positive and negative regulatory forces are tightly coupled in the evolution) of Alu-exons. In species where mutations made the splice sites of Alu-exons stronger, the uridine tracts are longer, which allows hnRNP C to act as a counteracting force. This allows the Alu-exons to remain in a harmless cryptic state over long evolutionary periods, during which they accumulate additional mutations. We hypothesise that the repressive function of hnRNP C prevents the damaging effects of immediate Alu exonization, and the length of uridine tracts represents a ‘molecular rheostat’. After Alu-exons accumulate lots of mutations, the uridine tracts gradually shorten, and as a result the Alu-exons start escaping from repression, and contributing to new cellular functions. We hypothesise that the uridine tracts in Alu elements can buy the time needed for mutations to make beneficial changes, rather than disruptive ones, during the evolution of a species.
Julian König, Kathi Zarnack, Mojca Tajnik, Jan Attig, Igor Ruiz de los Mozos, Federico Agostini
The guardian of the transcriptome
Regulating Alu element exonization
A hidden code in our DNA explains how new pieces of genes are made
Recursive splicing in long intronsLong introns contain hundreds of so-called ‘cryptic sequences’ that appear very similar to exons, but are not supposed to be used. The cellular machinery faces great challenges in distinguishing true exons from these cryptic sites. We found that cells sometimes select a cryptic exon that is present deep within a long intron, but later discard it, in a process called recursive splicing (see the paper here). Normally recursive exon removes this cryptic exon, allowing it to remain invisible. However, if the recursive site is preceded by other cryptic splicing events, then the exon is not removed – creating a ‘binary switch’ or checkpoint that can distinguish correct splicing events from the newly emerging cryptic events, which could be potentially damaging. Thus, long introns on one hand enable emergence of many cryptic splicing events during evolution, whereas recursive splicing ensures that this evolutionary tinkering does not disturb the primary mRNA that needs to be made from the gene. We observed this process happening in some of the longest genes that are expressed in human brain, which are often implicated in autism or other neurodevelopmental disorders.
Chris Sibley, Warren Emmett, Lorea Blazquez, Andrea Elser
A new genetic switch uncovered in the long genes expressed in our brain
Splicing does the two-step
In a commentary, scientists express their fascination by introns.
RNPs in aging and disease.We study the regulatory networks controlled by RNA-binding proteins that can cause amyotrophic lateral sclerosis when mutated. We have so far focused on TDP-43 and FUS, showing that both proteins regulate alternative splicing of a functionally coherent set of transcripts, many of which encode proteins implicated in neurodegenerative disorders.
We have examined changes in gene expression in human postmortem brain samples to compare the effects of healthy aging with two neurodegenerative diseases: Alzheimer's disease and frontotemporal lobar degeneration (FTLD). We observed widespread changes in alternative splicing: most were specific to diseased samples, but some were common to aging and disease. Especially the changes in glial-specific genes appeared to be shared, with the decrease in oligodendrocyte-specific genes being most apparent. Therefore, we further analysed gene expression in three large cohorts of samples to examine changes in ten brain regions upon aging. Stratifying the gene expression by cell type, we found that astrocytes and oligodendrocytes diminish their regional identity upon aging. We also developed a machine learning method to analyse high-resolution images of brain sections, and applied it to a more limited number of samples. This indicates that the number of oligodendrocytes decreases, which less change is seen in the total number of neurons upon aging. However, the neurons with largest cell bodies appear to also decline in their number. On the other hand, we find dramatic increase in the expression of microglia and endothelial-specific genes in all brain regions upon aging. We hope that these findings will be of use for further studies of the cellular phase of aging and the Alzheimer's Disease. A table of the relevant genes and their differential expression upon aging in each brain region is available here.
James Tollervey, Boris Rogelj, Rickie Patani, Michael Briese, Lilach Soreq, Claire Hall, Martina Halleger, Frederique Rau
CLIPs of TDP-43 Provide a Glimpse Into Pathology, Alzheimer Research Forum
FUS and Friends: Two Studies Probe FUS’ RNA Partners
New Link Revealed Between Alzheimer's Disease and Healthy Aging
More Clues How the Brain’s ‘Other Cells’ Change As We Age
Aging Causes “Identity Crisis” in Glia
Scientists Could Identify a Person’s Age by Looking at These Cells in Their Brain
As the Brain Ages, Glial-Cell Gene Expression Changes Most
To Understand a Brain’s Age, Focus on More Than Neurons
A video on Glia: Could These Brain Cells Help Explain What Causes Dementia?