Research

The National Centers of Systems Biology (NCSB) funded by the National Institutes of General Medicine (NIGMS) NIH, promotes institutional development of multi-disciplinary research, training, and outreach focused on systems-level studies of biomedical phenomena. CRSB is one of thirteen NCSB Centers nationwide, located on the UC Berkeley campus with faculty, teams and science occurring at both UC Berkeley and San Francisco campuses. The CRSB faculty and researchers use systems biology to establish a fundamental basis for understanding and predicting the control of mRNA fate due to RNA structure embedded in pre-mRNA and mRNA sequences. Using a suite of tools, we detect pre-mRNA and mRNA structural features in human cells at a systems level.

project1

Project I. Systems Level Analysis of Alternative pre-mRNA Splicing

PIs: Ming Hammond (UCB), Donald Rio (UCB) with Liana Lareau

Alternative pre-mRNA splicing is a major pathway for the regulation of gene expression in metazoans, including humans. It provides a mechanism for cells to generate vast cellular proteomic diversity from a limited number of genes. Recent estimates indicate that >95% of human protein coding genes give rise to two or more spliced variant mRNAs, often in different cell, tissue, or organ types. Importantly, many disease gene mutations in humans cause defects in RNA processing and surveillance pathways, including pre-mRNA splicing. 10-15% of pathogenic mutations occur at splice sites, and many inherited diseases, including neurological, metabolic and myogenic disorders, result from a wider array of mutations causing RNA mis-splicing. Altered pre-mRNA splicing patterns have been linked to cancers such as breast and ovarian, as well as to neurodegenerative diseases. Finally, both anti-sense oligonucleotides and small molecules have shown promise as therapeutic interventions that might ameliorate splicing defects for certain mis-splicing diseases. Evidence for a role for pre-mRNA structure in control of splicing events exists for individual genes, but a comprehensive understanding of the impact of RNA structure on splicing control—and its relation to genetic variation—is not known for the human transcriptome. The investigators in the CRSB are seeking to develop, examine, and integrate transcriptome-wide RNA structure probing information with genome-wide data sets generated by the ENCODE through genome-wide comparative sequence analysis. The overall project goal is to systematically link cis-regulatory elements in pre-mRNAs to RNA structural features that control alternative pre-mRNA splicing in human cells.

For more information about Dr. Lareau’s work, please also see Core 3.

project2Project II. Systems-Wide Analysis of Translation Initiation Control by mRNA Structure

PIs: Jamie Cate (UCB), Jennifer Doudna (UCB), Jonathan Weissman (UCSF)

Protein levels in cells correlate poorly with levels of mRNA transcripts and depend strongly on levels of translation. Regulation of translation initiation therefore serves as a key determinant of gene expression, and directly impacts many human diseases, including many cancers. In cap-dependent translation initiation, the initiation machinery scans from the 5’ end 7-methyl-G (m7G) cap structure to the correct AUG start codon, typically the first AUG codon in the mRNA sequence. Although this scanning model for translation initiation holds for many mRNAs, both in vitro and in vivo, many interesting alternative mechanisms of translation initiation exist. At present, data are lacking on how often alternative modes of translation initiation in humans are used and how the cellular machinery chooses one mode over another. While the sequence requirements for translation start codon selection have been studied for individual genes, we aim to develop an understanding of the rules for start site selection for the human transcriptome. As opposed to probing the details of cis-regulatory control in model mRNAs, the investigators in the CRSB seek to map out the RNA structural features in human mRNAs that regulate translation initiation events in living cells. The overall project goal is to systematically define cis-regulatory elements in mRNAs that control translation initiation using RNA structural mapping and ribosome profiling in cells.

Project III. Influence of RNA Structure on miRNA-Mediated mRNA Turnover

PIs: Jennifer Doudna (UCB), Jonathan Weissman (UCSF), Adam Arkin (UCB), Lior Pachter (UCB)

RNA interference (RNAi) and related pathways trigger potent and specific regulation of gene expression in eukaryotes. Gene silencing begins with the binding of 21-nucleotide guide RNAs called short interfering RNAs (siRNAs) or microRNAs (miRNAs) to mRNAs, ultimately leading to destruction of the targeted transcript. While current understanding of miRNA target selection in the 3’-untranslated region (3’-UTR) demonstrates a required conserved Watson–Crick pairing between the target and the 5’ “seed” region of the miRNA centered on nucleotides 2–7. However the RNA structural context of the target sites also influences site efficacy. Transcript features that boost miRNA target site utilization include AU-rich sequence composition near the site, positioning near other active miRNA binding sites, and placement at least 15 nucleotides downstream of the stop codon within the 3’ UTR and not near the center of long 3’ UTRs. Yet, intracellular UTR structures are likely to differ substantially from predicted structures due to the presence of RNA-binding proteins, RNA tertiary structures, and multiple competing RNA secondary structures. As a result, the present mechanistic understanding of the influence of RNA structure on miRNA-mediated gene silencing is minimal.

project3The importance of miRNA-mediated gene regulation in many human diseases underscores the need to develop tools for predicting miRNA targets accurately at a global level. Specific miRNAs including miR-21 and those from the let-7 family are mis-regulated in various human cancers, and there is evidence for a prevalence of shorter 3’ UTRs in cancer cells that eliminate or alter RNA structure surrounding miRNA binding sites. While studies of individual miRNAs or computational investigations of miRNA populations have provided some insights, there has not been a systematic attempt to discover the roles that RNA structures play in determining miRNA target selection and silencing efficiency. The investigators at CRSB propose to determine structural properties of mRNAs that enhance or hinder miRNA-mediated regulation in human cells using ribosome profiling, RNA-structure detection, and SHAPE-based RNA chemical probing.

PROJECT AFFILIATIONS

Our current research effort includes the talent of new affiliates whose efforts have the potential to carry out creative and innovative research in their specialized areas. Affiliation with the CRSB gives access to a community of other quantitative RNA biologists and a forum to discuss work, enriching studies and maximizing potential to do innovative and deep research in RNA systems biology.

Gloria Brar / Project 2

Dr. Brar’s work aims to understand the molecular basis of the complex cellular changes that are responsible for meiosis. She has been focused on understanding the role of translational regulation in programming meiotic events and the remarkably diverse complement of genomic regions translated by the ribosome in meiotic cells, including a large number of very short proteins and upstream ORFs on canonical coding transcipts.

Steven Brenner / Project 1

Dr. Brenner’s lab has focused on RNA collection, sequencing directed into a computational pipeline – used to identify putative alternative splicing events dependent on the splicing factor of study, including those normally degraded by NMD. This will be integrated with available physical protein-RNA interaction data to yield a high confidence list of splicing events directly regulated by a splicing factor, performed on a number of candidates from the SR and hnRNP groups of splicing factors.

It is also known that many splicing events are stress-responsive [2]. Once the network we have detailed above is complete, we can begin to expose cells to stresses such as heat shock, DNA damage and hypoxia and monitor the cells for differential alternative splicing in NMD-competent and -incompetent cells. This will reveal additional splicing data that we can combine with our network analysis to predict what splicing factors may have a role in response to a tested stress. Response assays to knockdown or over-expression of a splicing factor will confirm if our predictions were accurate. Together, this will identify previously missed interactions between splicing factors and transcripts as well as link splicing factors to important biological processes.

Nicholas Ingolia / Project 2

A recent NIH Innovatior Awardee, Dr. Ingolia’s research focuses on the translational control of gene expression in order to understand how mRNA sequence features specify gene expression regulation by RNA-binding proteins in the cytosol. The ribosome profiling approach for global translational profiling forms an important foundation for work, which also involves high-throughput analysis of RNA sequence and protein function.

NEW TECHNOLOGIES FROM THE CRSB

CRSB investigators are repurposing sequence-and-structure specific nucleases that are part of the bacterial-CRISPR-Cas adaptive immune systems, to develop efficient approaches for purification of individual RNA-protein complexes and for identifying the proteins in these complexes by mass spectrometry. For example, investigators in Project 3 have engineered the RNA-guided DNA endonuclease Cas9 to recognize RNA molecules in a programmable fashion. Other Cas9-based tools have been developed as part of the CRSB for carrying out genome-wide transcriptome analyses. See the Publications page for more details.

Core I. Global mapping of RNA structure in vivo

Director: Jonathan Weissman

RNA structural elements are widely distributed in pre-mRNAs and mRNAs and serve to control gene expression. Non-coding elements in RNAs exploit the ability of RNA to fold into secondary and tertiary structures to achieve functional control of chemical reactions and gene regulation. In addition to the role of RNA structure in the protein synthesis machinery itself, due to the tertiary folding of ribosomal RNA (rRNA) and transfer RNAs (tRNAs), RNA structure plays a central role in defining the fate of mRNAs. Examples include folded RNAs in the spliceosome, riboswitches, Internal Ribosome Entry Sites, and the base pairing required for miRNA-mediated translational repression. We are developing methods for RNA structural probing for system-wide studies of RNA structure in pre-mRNAs and mRNAs in cells. The RNA Structure Mapping Core will provide the means to probe the connection between RNA structure and mRNA fate in living cells for the first time.

The objective of the RNA Structure Mapping Core is to discover RNA structural elements in pre-mRNAs and mRNAs that may serve as regulatory signals for gene expression. The strategy is based on three steps: (1) use of cell-permeable small molecules that modify nucleotides in a conformation-dependent manner (e.g. the presence of absence of base pairing) (2) the quantitative detection by deep sequencing of the sites of modification, and (3) computational analysis of the resulting data (see Core III).

Core II. Ribosome Profiling

Director: Jonathan Weissman

The advent RNA deep sequencing technologies has made it possible to monitor the internal state of a cell with unprecedented precision. However, these approaches quantify messenger RNA (mRNA) levels, whereas for the vast majority of cases, proteins are directly responsible for mediating a gene’s cellular functions. mRNA levels are often a highly imperfect proxy for protein production due to the extensive use of translational control, but understanding of the strategies used to regulate translation lags far behind that of transcription, largely due to the difficulty in monitoring protein synthesis rates. Ribosome profiling-deep sequencing of ribosome protected mRNA fragments-represents a transformative technology that dramatically advances the ability to monitor protein translation in vivo.

The primary objective of the Ribosome Profiling core is to provide key data for identifying cis-regulatory elements in mRNAs controlling translation levels and mRNA turnover. The Core, run by Director Jonathan Weissman, will provide experimental and computational support to enable researchers in all Projects to conduct and interpret ribosome-profiling studies. Additionally, the Core will continue to develop novel ribosome profiling-based strategies and refine existing tools.

 

Core III. Influence of RNA Structure on miRNA-Mediated mRNA Turnover

Director: Dr. Lior Pachter

Experiments in the CRSB generate a large variety of RNA deep-sequencing datasets, each with its own challenge in interpretation. The Computational Core provides the mathematical and quantitative tools for probing these complex datasets in a rigorous manner. Given that many of the experiments in the CRSB are the first of their kind, they require the development of new algorithms, statistical methodology and mathematical foundations for their analysis. These new tools will aid in the identification of RNA structures that control pre-mRNA and mRNA fate in human cells, and will be widely useful both in the CRSB and in the RNA community in general.

ONGOING COMPUTATIONAL PROJECTS

Bo Li / Core 3

Bo Li’s research interests include next generation sequencing data analysis–in particular, RNA-Seq data analysis–and statistical learning. He finished his PhD at University of Wisconsin-Madison, where he developed RSEM, one of the most widely used RNA-Seq transcript abundance estimation tools. RSEM was used in big national projects such as TCGA (https://wiki.nci.nih.gov/display/TCGA/RNASeq+Version+2). Then he moved to Prof. Lior Pachter’s lab as a Postdoc researcher. At Pachter lab, he works on a diverse number of topics, such as RNA-Seq systems biology, single-cell RNA-Seq data analysis. Now his work focuses on building computational models of DMS (dimethyl sulphate) -Seq data, which is a transcriptome-wide, in vivo “version” of SHAPE-seq.

James Lloyd / Project 1
The generation of network of NMD-targeted splicing events: Alternative splicing can generate much diversity in the transcriptome and proteome and is regulated by proteins called splicing factors. While the many of the targets of some splicing factors are known, many splicing outcomes are hidden by nonsense-mediated mRNA decay (NMD). Splicing events that introduce a premature stop codon are not commonly seen in RNA-seq analysis given they are degraded by NMD. To fully appreciate the range of targets a splicing factor acts on, we will knockdown or over-express a given splicing factor in both NMD-competent and -incompetent cell lines. Then RNA will be collected and sequenced, and a computational pipeline developed in the Brenner group will be used to identify putative alternative splicing events dependent on the splicing factor of study, including those normally degraded by NMD. This will be integrated with available physical protein-RNA interaction data to yield a high confidence list of splicing events directly regulated by a splicing factor. This will be performed on a number of candidates from the SR and hnRNP groups of splicing factors. We will then generate a network depicting the relationships between different splicing factors and, between splicing factors and transcripts. The relationships between splicing factors could be examined using an approach similar to that used in [1]. The prevalence of cross-regulation, where one splicing factor alters the splicing of the primary transcript encoding another splicing factor to produce more NMD-targeted variant will be of particular interest. Machine learning techniques will be applied to our RNA-seq datasets to better understand the cis-sequences of the primary transcript that recruit a particular splicing factor. Including isoforms that would normally be lost through NMD will increase the power of such an analysis. Functional enrichment of the targets of different splicing factors will also be analyzed to gain insights into the biological role of these splicing events.

It is also known that many splicing events are stress-responsive [2]. Once the network we have detailed above is complete, we can begin to expose cells to stresses such as heat shock, DNA damage and hypoxia and monitor the cells for differential alternative splicing in NMD-competent and -incompetent cells. This will reveal additional splicing data that we can combine with our network analysis to predict what splicing factors may have a role in response to a tested stress. Response assays to knockdown or over-expression of a splicing factor will confirm if our predictions were accurate. Together, this will identify previously missed interactions between splicing factors and transcripts as well as link splicing factors to important biological processes.