A gene needs to express itself in order to contribute to cellular functions. A gene becomes expressed when its genetic information from DNA is transcribed into an RNA molecule. The resulting RNA molecules is not naked in the cells; as soon as an RNA is transcribed, it becomes coated by RNA-binding proteins to form ribonucleoprotein complexes (RNPs). The RNPs oten involve dozens, if not hundreds of proteins bound to an RNA molecule.
Examples of abundant RNPs in our cells include the molecular machines that perform crucial steps in gene expression. Spliceosome does the job of splicing the exons from the nascent RNAs together to produce an mRNA, and ribosomes translate mRNAs into proteins. Splicing and translation are just two of the many stages that most RNA molecules pass through during their 'life cycle' - other stages involve 5' capping, 3' end processing and polyadenylation, editing, methylation, quality control, transport and decay, among others. Our research focuses on 'regulatory RNPs', which guide the RNA through these stages.
To understand how regulatory RNPs work, we develop techniques that integrate biochemistry and computational biology to obtain a comprehensive map of protein-RNA interactions within our cells. We developed the individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP), and a related method called hiCLIP, which identify protein-RNA and RNA-RNA interactions with nucleotide resolution across the transcriptome.
We use these methods in collaboration with the group of Nicholas Luscombe to study how the sequence and structure of RNAs defines the assembly of RNPs, and how the regulatory RNPs in turn guide the life cycle of each RNA.
Cells can change their gene expression by modulating the function of RNPs. Moreover, genetic studies have identified mutations that disrupt the normal function of RNPs. These mutations often cause neurologic diseases, particularly the motor neuron disease, also referred to as amyotrophic lateral sclerosis (ALS).
We study this disease in collaboration with the group of Rickie Patani by using induced pluripotent stem cells with specific genetic mutations, and differentiating them into motor neurons. We wish to understand how these mutations affect the assembly of protein-RNA complexes, thereby initiating the molecular cascade leading to disease. We study the following questions:
1) How do the RNA-RNA, protein-RNA and protein-protein interactions act together to define the assembly and function of regulatory RNPs?
2) How do transposable elements and other non-canonical regulatory sites affect RNP assembly, and thereby contribute to the evolution of new exons and RNA regulatory circuits?
3) How do RNPs guide the differentiation and functions of neurons or glial cells during brain development, aging or neurodegenerative diseases?
4) How do mutations cause disease by disrupting the function of RNPs or RNA regulatory elements, and what treatments could ameliorate this?
And here are some of the RNA stories that we have passed through:
The assembly of RNPs.
Techniques to identify protein-RNA binding sites.We developed individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) to quantify protein-RNA interactions in the whole transcriptome. We described the details of the iCLIP method and presented it in a video. We showed that the cDNAs in iCLIP truncate at crosslink sites, and developed computational methods based on cDNAs-starts for analysis of iCLIP data. We wrote three reviews on the advances in CLIP technologies: the 2012 and 2018 review describing the core steps of CLIP, and its variant experimental methods, and the 2018 computational review describing the approaches for data analysis and quality control.
We established a question-answer forum to discuss the details and reasons for each step of iCLIP, and other variant methods.
Julian König, Tomaz Curk (iCount), Yoichiro Sugimoto, Ina Huppertz, Nejc Haberman, Chris Sibley, Flora Lee, Anob Chakrabarti
Software for analysis of iCLIP data
Many computational tools have been developed for analysis of data produced by CLIP and its variant protocols, as described in our review. In collaboration with Tomaz Curk, we developed iCount, a Python code and associated command-line interface (CLI), which are available from GitHub, whichs map, annotates and analyses iCLIP data, and defines the peaks of high-occupancy protein-RNA binding sites. This code is the basis for the web server iMaps, which enables a streamlined analysis of data produced by iCLIP and its variant methods, and is available for general use. Published iCLIP data
Our published iCLIP sequencing data are available as raw format (fastq file) from public databases (Arrayexpress and/or GEO, as described in each published manuscript), and also in processed format on the public server iCount). Research Highlights
RNA Analysis Unearths Invaluable Insights
Techniques to identify RNA-RNA contacts, and the RNA secondary structures bound by RBPs.The secondary structure of mRNAs has important effects on its stability and translation. To understand the in vivo structure of full-length mRNAs, developed a technique called hiCLIP to identify the connections that hook sections of an mRNA together, which are called RNA duplexes. We were amazed to find that mRNAs form thousands of such duplexes, and often these duplexes hook together very distant parts of mRNA molecules. We found that that these duplexes interact with the double-stranded RNA binding protein Staufen 1. We also found that these RNA duplexes have less genetic variation in humans than surrounding areas of the mRNA, indicating that mutations could cause disease by disrupting the structure of mRNAs.
Yoichiro Sugimoto, Christina Militti, Flora Lee
Structure of genetic messenger molecules reveals key role in diseases
hiCLIP: New method finds structures of mRNA molecules
Detailed probing of RNA structure in vivo
The functions of RNPs
RNA maps: how does the location of RNA binding site instruct its function?We integrate transcriptomic data on protein-RNA interactions and their function, which can tell us how ribonucleoproteins (protein-RNA complexes) assemble at specific positions on their target transcripts and thereby regulate alternative splicing, mRNA decay or translation. We use iCLIP to assess where an RBP binds its target transcripts, while also assessing how this RBP controls pre-mRNA processing. Integration of these two approaches showed that most RBPs regulate alternative splicing according to genome-wide positional principles, or RNA splicing maps. For example, by integrating TIA iCLIP with its splicing analysis upon TIA knockdown, we were able to derive nucleotide-resolution RNA splicing maps of TIA proteins. We developed the software (RNA motifs) that can derive RNA splicing maps by analysis of multivalent RNA motifs that are often bound by RBPs, and the web platform RNAexpress that can integrate diverse data and perform motif analyses to derive RNA maps for regulation of alternative polyadenylation and splicing.
We also collaborate with the group of Richard Jenner in studies of RBPs that bind to nascent RNA to regulate transcription and chromatin.
Zhen Wang, Matteo Cereda, Melis Kayikci, Gregor Rot, Julian König, Kathi Zarnack, Nejc Haberman, Anob Chakrabarti
The evolution of RNAs and their regulationStudies that mapped the binding sites of RNA-binding proteins across the transcriptome have uncovered an unprecedented diversity of previously unknown non-canonical sites for RNA processing, often located far from annotated exons. These studies identified many cryptic events located far from the currently annotated exons and unconventional splicing mechanisms that have important roles in regulating gene expression. These non-canonical splicing events are also a major source of newly emerging transcripts during evolution, especially when they involve sequences derived from transposable elements. We study RBPs that are specialised for binding to these elements, which ensures their tight regulation and quality control. While mutations perturbing binding of RBPs to these elements can disrupt gene expression and lead to diseases, we found that they are also a major driving force for the emergence of new exons during evolution. Image on the right shows the nucleotides that encode genetic information, from the cover inspired by our review.
In a commentary, scientists express their fascination by introns.
Alu-derived exonsBy identifying RNA binding sites of hnRNP C across the transcriptome with iCLIP, we have shown that hnRNP C binds to long uridine tracts, and thereby regulates splicing of alternative exons.
We have also found that hnRNP C can strongly repress inclusion of exons that are derived from Alu elements, or so-called Alu-exons. Alu retrotransposable elements are specific for primate genomes, and they have probably played an important role in the evolution of primates, since they constitute 10% of the human genome. hnRNP C represses recognition of cryptic splice sites in Alu elements by displacing the splicing factor U2AF65 from the uridine tracts. Loss of hnRNP C leads to formation of thousands of harmful exons, and mutations within uridine tracts in Alu elements can cause many human diseases.
We have shown that the positive and negative regulatory forces are tightly coupled in the evolution) of Alu-exons. In species where mutations made the splice sites of Alu-exons stronger, the uridine tracts are longer, which allows hnRNP C to act as a counteracting force. This allows the Alu-exons to remain in a harmless cryptic state over long evolutionary periods, during which they accumulate additional mutations. We hypothesise that the repressive function of hnRNP C prevents the damaging effects of immediate Alu exonization, and the length of uridine tracts represents a ‘molecular rheostat’. After Alu-exons accumulate lots of mutations, the uridine tracts gradually shorten, and as a result the Alu-exons start escaping from repression, and contributing to new cellular functions. We hypothesise that the uridine tracts in Alu elements can buy the time needed for mutations to make beneficial changes, rather than disruptive ones, during the evolution of a species.
Julian König, Kathi Zarnack, Mojca Tajnik, Jan Attig, Igor Ruiz de los Mozos, Federico Agostini
The guardian of the transcriptome
Regulating Alu element exonization
A hidden code in our DNA explains how new pieces of genes are made
LINEs, RNP assembly and the evolution of RNA processingWe described how a repetitive genetic element called LINE reshaped the regulation of our genes by giving rise to new exons during evolution. Hundreds of thousands of LINEs are located in our genes, most often within the longest introns. We found that intragenic LINEs play a key role in assembling protein-RNA complexes, thereby regulating RNA processing. We studied two proteins that are particularly enriched on LINEs, MATR3 and PTBP1. These proteins interact with each other via a short linear peptide in a region of MATR3, which binds to the alpha-helical surface in the RRM of PTBP1. LINEs contains many short sequences that trap these proteins through multivalent binding, and the two proteins need to bind together as a complex in order to efficiently bind to LINEs. When binding intronic LINEs, the two proteins can repress or regulate the exons that emerge from them.
By comparing mammalian genomes, we found that LINEs bind differently to proteins depending how evolutionarily old they are. We see that the repressive proteins favour the evolutionarily young LINE elements, which contain multivalent binding sites for PTBP1 and MATR3. This ensures that a new LINE insertion in an intron can be immediately repressed, thus preventing the splicing machinery from recognising any disruptive splice sites within the LINE. Notably, the repressive effects of LINE-binding proteins can spread to the surrounding area on the RNA. We find that young LINEs are positioned far from exons in different species, indicating that they help the evolution to define what is an intron and what an exon.
In 1977, Francois Jacobs termed the phrase evolutionary tinkering to propose that evolution needs that mutations accumulate gradually in our genes while minimising disruption of the existing functions. The LINEs appear to be perfectly suited to this purpose: like yin and yang, they contain potential new splice sites, as well as repressive elements that stops these splice sites from disrupting gene expression. Binding of repressive proteins allows LINEs to remain in a harmless state over long evolutionary periods, during which they can accumulate mutations that gradually lead to formation of alternative exons. The oldest LINEs most frequently give rise to exons that are part of mRNAs in human tissues. Mutations in LINEs gradually decrease the multivalency of binding sites for MATR3 and PTBP1, and thus the exons are released from their repression. In human, hundreds of LINE-derived exons are present in mRNAs, most often in tissues with high transcriptomic diversity, such as the brain. Thus, protein-RNA complexes ensure that new exons gradually emerge from LINEs while preserving existing functions, allowing complex organisms like us to adapt and survive.
Jan Attig, Federico Agostini
LINE-ing Up to Influence Splicing; a video explains our study.
Recursive splicing in long intronsLong introns contain hundreds of so-called ‘cryptic sequences’ that appear very similar to exons, but are not supposed to be used. The cellular machinery faces great challenges in distinguishing true exons from these cryptic sites. We found that cells sometimes select a cryptic exon that is present deep within a long intron, but later discard it, in a process called recursive splicing (see the paper here). Normally recursive exon removes this cryptic exon, allowing it to remain invisible. However, if the recursive site is preceded by other cryptic splicing events, then the exon is not removed – creating a ‘binary switch’ or checkpoint that can distinguish correct splicing events from the newly emerging cryptic events, which could be potentially damaging. Thus, long introns on one hand enable emergence of many cryptic splicing events during evolution, whereas recursive splicing ensures that this evolutionary tinkering does not disturb the primary mRNA that needs to be made from the gene. We observed this process happening in some of the longest genes that are expressed in human brain, which are often implicated in autism or other neurodevelopmental disorders.
Chris Sibley, Warren Emmett, Lorea Blazquez, Andrea Elser
A new genetic switch uncovered in the long genes expressed in our brain
Splicing does the two-step
Regulation of recursive splicing by the exon-junction complexOur initial study of recursive splicing discovered cryptic RS-exons in the middle of long introns, and later we found that hundreds of annotated exons can also be skipped through recursive splicing by using a 5’ splice site that is reconstituted at their exon-exon junction (see the paper here). Most of these RS-exons are normally constitutively included because of exon junction complex (EJC), which represses their recursive splicing to preserve transcriptome integrity. This is particularly relevant in Deuterostomes and in the brain, where recursive splicing (RS) contributes to physiological alternative splicing. Increased skipping of annotated RS-exons upon defects in EJC could contribute to microcephaly phenotype and human disease.
Lorea Blazquez, Warren Emmett, Rupert Faraway, Chris Sibley
The Exon Junction Complex: A Multitasking Guardian of the Transcriptome
RNPs in aging and disease.We study the regulatory networks controlled by RNA-binding proteins that can cause amyotrophic lateral sclerosis when mutated. We have so far focused on TDP-43 and FUS, showing that both proteins regulate alternative splicing of a functionally coherent set of transcripts, many of which encode proteins implicated in neurodegenerative disorders.
We have examined changes in gene expression in human postmortem brain samples to compare the effects of healthy aging with two neurodegenerative diseases: Alzheimer's disease and frontotemporal lobar degeneration (FTLD). We observed widespread changes in alternative splicing: most were specific to diseased samples, but some were common to aging and disease. Especially the changes in glial-specific genes appeared to be shared, with the decrease in oligodendrocyte-specific genes being most apparent. Therefore, we further analysed gene expression in three large cohorts of samples to examine changes in ten brain regions upon aging. Stratifying the gene expression by cell type, we found that astrocytes and oligodendrocytes diminish their regional identity upon aging. We also developed a machine learning method to analyse high-resolution images of brain sections, and applied it to a more limited number of samples. This indicates that the number of oligodendrocytes decreases, which less change is seen in the total number of neurons upon aging. However, the neurons with largest cell bodies appear to also decline in their number. On the other hand, we find dramatic increase in the expression of microglia and endothelial-specific genes in all brain regions upon aging. We hope that these findings will be of use for further studies of the cellular phase of aging and the Alzheimer's Disease. A table of the relevant genes and their differential expression upon aging in each brain region is available here.
James Tollervey, Boris Rogelj, Rickie Patani, Michael Briese, Lilach Soreq, Claire Hall, Martina Halleger, Frederique Rau
CLIPs of TDP-43 Provide a Glimpse Into Pathology, Alzheimer Research Forum
FUS and Friends: Two Studies Probe FUS’ RNA Partners
New Link Revealed Between Alzheimer's Disease and Healthy Aging
More Clues How the Brain’s ‘Other Cells’ Change As We Age
Aging Causes “Identity Crisis” in Glia
Scientists Could Identify a Person’s Age by Looking at These Cells in Their Brain
As the Brain Ages, Glial-Cell Gene Expression Changes Most
To Understand a Brain’s Age, Focus on More Than Neurons
A video on Glia: Could These Brain Cells Help Explain What Causes Dementia?