This week we profile a recent publication in Cancer Cell from Dr. Gordon Robertson (pictured)
at the Michael Smith Genome Sciences Centre.
Can you provide a brief overview of the work you do for the TCGA?
The Cancer Genome Atlas (TCGA) was a large-scale cancer molecular profiling project that was jointly supported and managed by the National Institutes of Health’s National Cancer Institute and National Human Genome Research Institute. In this initiative, many organizations contributed tumor tissues, and many genome organizations and research groups collaborated within a research network. TCGA generated clinical and multiplatform genomic data (DNA, RNA, and protein) for 33 common and rarer cancers. The data sets, available at the Genomic Data Commons, enable a global cancer research community.
For each TCGA project, an analysis working group (AWG) generated at least one major ‘marker’ publication. For each marker paper, AWG members sought to use integrated analysis across the clinical and molecular profiling platform data to generate new insights into the biology of that cancer, e.g. histological vs. molecular subtypes, subtype-specific driver alterations, and subtype/alteration-specific potential treatments. The clinical and research contexts, and so the opportunities to contribute, varied widely between the cancer types.
The BC Cancer Agency’s Genome Sciences Centre (BCGSC) generated all ~11,500 microRNA sequence data sets for the 33 cancers that were characterized by the consortium. We described the miRNA-seq data generating system in 2016. We also contributed messenger RNA sequence data for four projects: acute myeloid leukemia (AML), ovarian, gastric and esophageal cancers.
Many research and production teams and individuals within the BCGSC were involved in generating the TCGA miRNA-seq and mRNA-seq data. And all TCGA data generators across the research network faced demanding challenges, for years, in scaling up to satisfy TCGA’s scope and schedule.
As a staff scientist and analyst at the BCGSC, I was responsible for developing analysis methods, and for generating major features for marker paper manuscripts (figures, text, data spreadsheets) for miRNAs in roughly 20 of the 33 projects. Reanne Bowlby was our miRNA lead for the other TCGA projects. We worked in parallel with and downstream of the BCGSC’s many library, sequencing, and informatics production teams.
Outlining highlights from the TCGA uveal melanoma (UVM) project will give a better sense for the analysis work, and for the diverse analysis teams that collaborated at the BCGSC, and across the research network. Combined, the UVM AWG members had substantial experience in clinical practice and research, and in genomics. Across the network, DNA, RNA and protein teams generated data for 80 primary tumours. Data analysis was done by each platform’s teams, and by integrative analysis teams.
At the BCGSC, unsupervised consensus clustering with miRNA-seq data quickly identified four molecular subtypes, and we extended these results by identifying miRNA mature strands that were differentially abundant between the subtypes, and that were influenced by somatic copy number alterations. Given what the project’s clinical members had identified as opportunities to contribute, getting four subtypes was promising ¾ though, of course, at that early stage in the project, what we understood of the disease, and of what other data types were saying, was relatively undeveloped.
Shortly after this, Ewan Gibb (then at the BCGSC, now at GenomeDx in Vancouver) suggested that we test a similar subtype analysis with long noncoding RNAs (lncRNAs), whose expression he could calculate from the TCGA mRNA-seq data. To that point, lncRNAs had not been integrated into analysis in TCGA projects. We gradually worked out methods that let us report on ‘noncoding’ RNAs, i.e. on both lncRNAs and miRNAs. For the UVM project, identifying four subtypes from copy number data (Juliann Shih, Broad Institute), and from coding and noncoding RNAs, in addition to four major subtypes from PARADIGM pathway activity analysis (Cristina Yau, Buck Institute), gave us additional confidence in the results, and insights into subtype biology.
BAP1 (3p21.1) is an important tumour suppressor, and BAP1 inactivation is a risk factor in a number of cancers. In UVM, loss of one of two copies of chromosome 3 (monosomy 3) is a key event that distinguishes good- from poor-survival cases, and, in poor-survival cases the BAP1 on the remaining chromosome 3 can be inactivated by mutations or other alterations. To assess BAP1 alterations, Karen Mungall’s de novo sequence assembly team used mRNA-seq data, and were complemented by Julian Hess’s (The Broad Institute) reassembly analysis of exome capture DNA sequencing data. Combining results from the mRNA and DNA analyses showed that standard mutation/indel calling tools had failed to report larger structural alterations in the BAP1 gene in 18 (23%) samples. This result has implications for practical clinical tests to detect alterations in BAP1 and other tumour suppressors, in a range of cancers.
What is the significance of the findings of the current publication?
UVM arises from melanocytes (melanin pigment producing cells) that reside in the uveal tract of the eye. Primary UVM is treated with either surgery or radiation, and has a low recurrence rate in the eye. However, after treatment of the primary tumor, up to half of UVM patients develop distant metastatic disease within three years, often to the liver, and such patients have poor survival. The other half of patients have a lower frequency of developing metastatic disease, and do so more slowly. Higher- vs. lower-risk patients can be distinguished by cytogenetics for chromosomes 3 and 8q, and by a commercial 12-gene expression panel. Previous work has shown that genetic features allow further dividing the better-survival group into two subgroups.
For the TCGA cohort (n = 80), we showed that the poor-prognosis UM initially develop monosomy 3, followed by BAP1 alterations that are associated with a unique global DNA methylation profile. Despite this shared methylation state, poor-prognosis monsomy 3 cases separated into two subsets by copy number alterations, RNA (mRNA, lncRNA and miRNA) expression, and cellular pathway activity profiles. Our analysis showed that the somatic copy number subtypes and associated gene expression subtypes correlate with differential time to metastasis.
This global, integrated analysis has great potential to influence the frequency of metastatic surveillance, prioritize high-risk patients for more aggressive/earlier adjuvant clinical trials, provide more precise UM metastasis data for the design of clinical trials and use of historical controls, and offer information to patients that may assist them in medical and personal choices. As no effective adjuvant therapy has yet been developed for UVM, a prospective analysis that characterizes the two poor-survival molecular subtypes relative to UVM metastasis would be timely and important.
What other projects are you currently working on?
At the time of writing, I’m working with an experienced team on late-stage revisions of a TCGA manuscript on 412 cases of muscle-invasive bladder cancer. We made a first TCGA publication available in 2014 on a 131-case subset of the current cohort. Now, a cohort three times larger has allowed us to extend that work. Ewan and I again collaborated to add lncRNAs into the multiplatform data, and I worked with Mauro Castro of the Federal University of Paraná Polytechnic Center, Curitiba, Brazil to contribute regulon analysis and multivariate survival analysis.
TCGA manuscripts on mesothelioma and testicular germ cell cancer are in review. Revisions may be required in these.
Extending TCGA, many AWGs have worked for some time on ‘PanCancer’ manuscripts, doing analyses across all or many related TCGA cancer projects, so calculating on tens of thousands of datasets per genomic platform. Reanne Bowlby was the BCGSC lead for miRNA and other analyses for many of these projects; for certain of these projects, I’m involved in miRNA analysis, and, through Ewan Gibb, in lncRNA analyses.
In 2016, the BCGSC and the Institute for Systems Biology (ISB) in Seattle were awarded NIH funding that allows a collaborative BCGSC/ISB team to participate in the ongoing Genome Data Analysis Network (GDAN) that is administered by the Center for Cancer Genomics (CCG). In this project, I’m collaborating with the ISB’s Theo Knijnenberg, Varsha Dhankani, Sheila Reynolds and Ilya Shmulevich to make miRNA analyses available in ISB’s cloud computing infrastructure.
This article was contributed by Dr. Gordon Robertson and Dr. Scott Woodman, with special thanks to Jenny Yang for editing.