- Data Note
- Open Access
- Open Peer Review
Multi-platform microRNA profiling of hepatoblastoma patients using formalin fixed paraffin embedded archival samples
GigaSciencevolume 4, Article number: 54 (2015)
Formalin fixed paraffin embedded (FFPE) samples are a valuable resource in cancer research and have the potential to be extensively used. However, they are often underused because of degradation and chemical modifications occurring in the RNA that can present obstacles in downstream analysis. In routine medical care, FFPE material is examined and archived, therefore clinical collections of many types of cancers exist. It is beneficial to assess and record the quality of data that can be obtained from this type of material. The current study investigated three independent platforms and their ability to profile microRNAs (miRNAs) within FFPE samples from hepatoblastoma (HB) patients.
Here we present three types of datasets consisting of miRNA profiles for 13 HB patients with different tumour types and molecular variations. The three platforms that were used to generate these data are: next-generation sequencing (Illumina MiSeq), microarray (Affymetrix® GeneChip® miRNA 3.0 array) and NanoString (nCounter, Human v2 miRNA Assay). The mature miRNAs identified are based on miRBase version 17 and 18.
These datasets provide a global landscape of miRNA expression for a rare childhood cancer that has not previously been well characterised. These data could serve as a resource for future studies aiming to make comparisons of HB miRNA profiles and to document aberrant miRNA expression in this type of cancer.
MicroRNAs (miRNAs) are a large group of small non-protein coding RNAs, which are important epigenetic regulators of gene expression [1, 2] and have a role in transcriptional control in a variety of cancers including hepatoblastoma (HB) [3, 4]. miRNA profiling research has identified unique signatures that can be used to classify cancers by determining specific miRNA markers predicting favourable or unfavourable prognoses. Cataloguing the miRNA expression profiles for a large number of different tumour classes may aid in both diagnosis and treatment of cancer [5, 6].
Formalin fixed paraffin embedded (FFPE) samples are a major source of material for HB research. Because of the rare nature of this disease and the limited availability of fresh-frozen tumour samples, it is essential to successfully utilise FFPE samples to obtain high-quality data from these tumours. However, analysis of FFPE material presents obstacles because the process of fixation and embedding, as well as storage time, can negatively impact the quality of RNA isolated from these samples. At a molecular level, modifications occurring from chemical reactions between the fixative and nucleic acids may cause nucleic acid fragmentation and degradation of the RNA . Previous studies have carried out comparative analyses of multiple platforms using non-FFPE material [8, 9]. Other studies have compared platforms for their compatibility with FFPE material but have utilised only one or two platforms, such as microarray with validation using RT-qPCR. Another study used both fresh frozen and FFPE tissue on multiple platforms, however, the sample numbers were very small [10–15]. Studies that have examined HB and miRNAs often used a more targeted approach to investigate candidate miRNAs, rather than performing global profiling of tumour samples . Therefore, the miRNA profile of HB tumours has not been extensively investigated, and a global assessment of the miRNA landscape in HB is lacking.
Here we describe miRNA profiles for 13 HB tumour samples, which we achieved by using a combination of three platforms for miRNA detection. Three of the 13 samples were run using next-generation sequencing (NGS) and a microarray (MA), and a total of 12 of the 13 samples were assessed using NanoString (NS). A comprehensive analysis of the shared miRNA detection across platforms and a comparison of the most highly abundant miRNAs is described.
FFPE material poses challenges to analysis; the most notable being degradation of the sample material. This is an important consideration for generating data and platform selection. Varying amounts of starting material are required for different technologies; for instance, the NGS platform used for analysis required 1 μg of RNA, while the MA required 400 ng, and the NanoString only required 100 ng. These starting amounts can either greatly hinder or enhance the data that can be generated, based on the limited amount of sample available to a researcher. When assessing miRNAs, RNA quality (as determined by the RNA integrity number or RIN, which gives an indication of how intact the total RNA is), may not play as important a role when considering which platform to choose. Our study indicates sample RIN numbers as low as 1.7 produce good quality miRNA data on all the platforms assessed in this study.
We generated these datasets to describe the miRNA landscape in hepatoblastoma using FFPE samples . In addition, we aimed to describe the strengths and weaknesses of the different platforms: NGS, MA, and NS, for the detection of miRNAs. The level and technical reproducibility for detecting miRNAs in each platform for each sample was investigated and compared between platforms. Further, results were collated to determine the level of shared detection and abundance of specific miRNAs between these three platforms. Hierarchical clustering was performed on the NanoString dataset, which revealed similarities in miRNA profiles in a number of samples, and a unique profiling pattern present in an aggressive HB phenotype .
A total of 13 HB tumour samples were evaluated in this study. Three samples (S4, S5 and S6) were analysed with technical replicates on the NGS and MA platforms. S5 and S6 were also analysed with the NS platform (S4 was excluded due to limited sample availability). Further, an additional 10 samples (S7–S16), making a total of 12 HB tumours, were investigated on the NS platform. The age of the patients ranged from 5 months to 10 years 6 months. The samples were a mix of tumour types; the most common being epithelial and fetal, followed by epithelial and mixed fetal embryonal, mixed epithelial mesenchymal and fetal, and finally mixed epithelial mesenchymal and mixed fetal embryonal. One sample (S5) was described histologically as cholangioblastic. Four of the samples contained a mutation in the CTNNB1 gene (beta-catenin)  (Table 1).
Platforms used for miRNA quantification
Three platforms were used for miRNA quantification in this study. For the next generation sequencing, 1 μg input RNA was required to construct the small RNA libraries with the TruSeq® Small RNA sample preparation kit (Illumina, San Diego, CA) according to the manufacturers instructions. The Illumina MiSeq platform was used to produce single ended, 50 bp sequenced reads (in FASTQ format). For the microarray platform, 400 ng of input RNA was required. The samples were labelled without amplification with the Affymetrix FlashTag™ Biotin HSR RNA Labeling Kit; the labelled samples were hybridised on an Affymetrix® GeneChip® miRNA 3.0 array according to manufacturers guidelines. This chip is able to detect 1733 mature miRNAs based on miRBase version 17. The NanoString platform utilises colour-coded molecular barcodes (probes) to directly hybridise to targets of interest. Single molecule imaging is used to collect highly accurate digital counts of different nucleic acids corresponding to each barcode. We have used Human v2 miRNA Assay Kit for the NanoString platform, which requires 100 ng input RNA and is capable of detecting 800 mature miRNAs based on miRBase version 18. Samples were prepared and analysed according to standard NanoString guidelines for miRNA analysis.
Quality assessment of miRNA detection platforms data and post processing
We briefly describe the processing steps to generate these datasets (Fig. 1). A detailed description of the experimental and analysis steps can be found here . Additionally, to ensure consistent data were achieved using both miRBase versions 17 and 18, the miRNAs were manually checked for inconsistencies, and nomenclature was matched to miRBase version 17. This permitted appropriate comparison of the panel of 800 miRNAs with the microarray and NGS data for further analysis (Additional file 1: Table S1). The proportion of miRNAs mapped for each sample to the total number of identifiable miRNAs from each platform can be found in Additional file 1: Tables S2–S4.
Next-generation sequencing data
Assessment of the quality of the sequenced reads, quality trimming, and removal of adaptors from the 3’ end of the sequences was performed as previously described [19–22]. The median Phred score of the sequenced bases was > 34 through to the fiftieth sequencing cycle for all the analysed samples. The GC percentage of the samples ranged between 51 and 61. Adapter sequences were removed using Cutadapt . Processed reads were mapped to known miRNAs from the miRBase 17 database using Bowtie1 and miRDeep2 [24, 25]. The number of reads that mapped to an individual miRNA was used to represent its level of expression.
Raw data in CEL files were normalised using a robust multi-array average (RMA) approach. Median values of each of the probe sets were used to summarise expression values for each microarray chip. Confidently identified miRNAs were determined using a threshold determined by a spike-in control included on the chip (Biob_3). This spike-in has the lowest concentration and represents the limit of detection of the microarray. Probe intensity values below the limit are considered to be background noise and are removed from further analysis. Data quality control was assessed using the miRNAv3 Array QC report. All parameters assessed indicated that high quality data were obtained (Fig. 1).
Raw data in RCC (reporter code count) files were loaded into nSolver™ and used to perform quality assessment and normalisation. A normalisation factor was generated using the geometric mean of the top 100 miRNAs for each sample to offset technical noise. Raw counts were multiplied by the normalisation factor to produce a list of normalised miRNA counts (Fig. 1). Negative controls were included in the expression assays; the limit of detection was calculated by adding 2 SD to the mean of the negative controls (threshold = mean + 2SD). NS data quality was assessed using nSolver® default instructions (version MAN-C0011-03) (NanoString Technologies Inc., nCounter Expression Data Analysis Guide). All samples passed all parameters, with the exception of S6, S7, S14, S15 and S16, which did not pass the positive control limit of detection. Degradation of RNA may have contributed to the low counts of miRNAs being obtained globally in these samples.
Comparison of miRNA detection and abundance with previous studies
We compared our datasets to several other, relevant published datasets. The first dataset comprised 33 miRNAs identified as differentially expressed between normal tissue and HB by Magrelli et al. . We found that 75.8 % of these differentially expressed miRNAs were also present in our list of 98 miRNAs, (miRNAs detected by at least one sample on all three platforms; significant overlap, P = 5.11e-18, hypergeometric test, determined with the reference set of miRNAs as the panel of 800 assessed by NanoString). Commonly detected miRNAs between these studies are reported in Additional file 1: Table S5. When we performed similar analysis with our dataset of 50 miRNAs (miRNAs detected in all of the 12 samples analysed by the NanoString platform), the overlap remained significant (30.3 % overlap, P = 1.01e-5, hypergeometric test, determined with the reference set of miRNAs as the panel of 800 assessed by NanoString).
Finally, we compared our data with the GSE21085 hepatoblastoma dataset, which includes non-coding RNA data (analysed on the OSU-CCC MicroRNA Microarray Version 2.0 [condensed version]). We found an overlap of 79 % between our 50 commonly detected miRNAs and GSE21085 (Additional file 1: Table S6). However, this comparison is not completely valid, and should be interpreted with caution: the GSE21085 dataset contains only miRNAs that were known at the time of the particular array design (2005). Additionally, the probes in this array used a selection of pre-miRNAs, while our analysis solely detects the sequences of mature miRNAs.
Validation by qPCR of miRNAs detected by individual platforms on the remaining material
qPCR is often considered the gold standard method for quantifying RNA expression. We therefore aimed to further validate the miRNAs detected by other platforms. We were able to perform qPCR experiments on only six samples out of the 13 investigated tumours because of the limited availability of RNA from these samples. Validation was performed on five miRNAs (miR-191, miR-95, miR-17, miR-181a, miR-106b), using housekeeping small RNA (RNU6B) as a control. We chose these miRNAs because they have previously been implicated in cancer, and miR-17 and miR-181a are also in the Magrelli dataset [3, 26–29]. NGS and MA platforms provided limited data for these miRNAs (only one sample could be assessed), so we were unable to make a direct comparison of these two platforms with qPCR. However, we were able to compare qPCR quantification with NS, and we observed that NS was better at detecting these five investigated miRNAs in our samples (Additional file 1: Table S7). Our data suggest that in some cases, such as when analysing limited archival FFPE samples, NanoString may be more effective than qPCR for the detection of miRNAs.
Potential use and application of the data
FFPE material is an important resource: if FFPE samples could be used to their full potential, they will be beneficial to cancer research. These datasets provide a resource describing the strengths and limitations of three platforms used in miRNA detection. These data can serve as a guideline for future research aimed at miRNA profiling, particularly of FFPE samples. Furthermore, 13 HB tumours have been characterised for their miRNA profiles, and since HB is a rare cancer, these datasets can be used in other HB research as a comparison and to supplement further work. Additionally, these data could be used alongside that from other childhood cancers to explore potential relationships and identify related patterns of miRNA expression.
Altered patterns of miRNA expression have previously been identified in HB and several miRNAs have been investigated as predictors of prognosis in patients [3, 30]. miRNA expression levels may help to understand factors contributing to the progression of this disease. For instance, from the datasets described here we identified that S5 had a distinctly altered miRNA expression pattern and clustered differently from the other HB tumours. This particular sample displayed an aggressive phenotype, and had the shortest event-free survival period of the 13 patients. This sort of information will therefore be valuable to establish a better understanding of the relationship between miRNA expression profiles and the severity of HB development in different patients.
Availability and requirements
The sequencing, microarray and NanoString data have been submitted to the NCBI GEO repository under three different accession numbers (Table 1). All datasets consist of a metadata spreadsheet that provides a summary of the project and the associated files. Downstream analysis using the processed data can be performed with standard computers (4 Gb RAM and 2–4 CPU cores).
The sequencing dataset contains one processed file displaying the raw read counts for each miRNA. This file was used to generate the lists of confidently identified miRNAs using arbitrary thresholds of both ≥ 5 and ≥ 10 reads . Raw Fastq files (total size: 1.07 GB) were included to perform alignment and downstream analysis, if desired by independent researchers.
The microarray dataset contains two files (processed data, in .xlxs format): a file containing all miRNAs from miRBase version 17 and the associated RMA normalised intensity signal for all samples. The second file contains the miRNAs confidently identified using the internal control bioB-3 signal intensity as a threshold for the limit of detection of the array. Raw CEL files comprising the raw signal intensity values of the probes (total size: 17.7 MB) are also included to allow independent processing if desired.
For NanoString, a matrix table with the commercial probe names for the 800 miRNAs is provided in the metadata spreadsheet; the raw data file contains RCC files (total size: 76 KB). These files may be used if alternative normalisation techniques and different detection thresholds are required. Further, two processed files are provided: the first contains the tabulated counts for each miRNA (from miRBase version 18) normalised to the top 100 miRNAs in each sample, and the second file contains only the miRNAs confidently identified after applying a threshold calculated by adding 2SD to the mean of the internal negative controls (threshold = mean + 2SD).
Multi-platform microRNA profiling of hepatoblastoma patients using formalin fixed paraffin embedded archival samples
Platform-independent, but UNIX/Linux preferred.
Availability of supporting data
Datasets supporting the results of this article are available in the NCBI Gene Expression Omnibus archive under accession number GSE62010 (sequencing), GSE62011 (microarray),and GSE62017 (NanoString). Data further supporting this paper can be found in the GigaScience Database 
Ethics approval and consent to participate
All clinical data and tumour tissue used in this study were collected with informed consent under ethics approvals from the institutional ethics committees of the participating centres and from the human disability and ethics committee (HDEC) of New Zealand (approval numbers are: CTY/01/10/141 and CTY/01/10/142). The experiment was carried out in accordance with approved guidelines.
Formalin Fixed Paraffin Embedded
Next Generation Sequencing
Robust Multi-Array Average
Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, et al. Combinatorial microRNA target predictions. Nat Genet. 2005;37(5):495–500.
Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120(1):15–20.
Magrelli A, Azzalin G, Salvatore M, Viganotti M, Tosto F, Colombo T, et al. Altered microRNA Expression Patterns in Hepatoblastoma Patients. Translat Oncol. 2009;2(3):157–63.
Calin GA, Dumitru CD, Shimizu M, Bichi R, Zupo S, Noch E, et al. Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A. 2002;99(24):15524–9.
Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. MicroRNA expression profiles classify human cancers. Nature. 2005;435(7043):834–8.
Chatterjee A, Eccles MR. DNA methylation and epigenomics: new technologies and emerging concepts. Genome Biol. 2015;16:103.
Von Ahlfen S, Missel A, Bendrat K, Schlumpberger M. Determinants of RNA quality from FFPE samples. PLoS One. 2007;2(12), e1261.
Pradervand S, Weber J, Lemoine F, Consales F, Paillusson A, Dupasquier M, et al. Concordance among digital gene expression, microarrays, and qPCR when measuring differential expression of microRNAs. Biotechniques. 2010;48(3):219–22.
Git A, Dvinge H, Salmon-Divon M, Osborne M, Kutter C, Hadfield J, et al. Systematic comparison of microarray profiling, real-time PCR, and next-generation sequencing technologies for measuring differential microRNA expression. RNA. 2010;16(5):991–1006.
Kolbert CP, Feddersen RM, Rakhshan F, Grill DE, Simon G, Middha S, et al. Multi-platform analysis of microRNA expression measurements in RNA from fresh frozen and FFPE tissues. PLoS One. 2013;8(1), e52517.
Glud M, Klausen M, Gniadecki R, Rossing M, Hastrup N, Nielsen FC, et al. MicroRNA expression in melanocytic nevi: the usefulness of formalin-fixed, paraffin-embedded material for miRNA microarray profiling. J Invest Dermatol. 2008;129(5):1219–24.
Li J, Smyth P, Flavin R, Cahill S, Denning K, Aherne S, et al. Comparison of miRNA expression patterns using total RNA extracted from matched samples of formalin-fixed paraffin-embedded (FFPE) cells and snap frozen cells. BMC Biotechnol. 2007;7(1):36.
Rosenfeld N, Aharonov R, Meiri E, Rosenwald S, Spector Y, Zepeniuk M, et al. MicroRNAs accurately identify cancer tissue origin. Nat Biotech. 2008;26(4):462–9.
Tetzlaff MT, Liu A, Xu X, Master SR, Baldwin DA, Tobias JW, et al. Differential expression of miRNAs in papillary thyroid carcinoma compared to multinodular goiter using formalin fixed paraffin embedded tissues. Endocr Pathol. 2007;18(3):163–73.
Zhang X, Chen J, Radcliffe T, Lebrun DP, Tron VA, Feilotter H. An array-based analysis of microRNA expression comparing matched frozen and formalin-fixed paraffin-embedded human tissue samples. J Mol Diagnos. 2008;10(6):513–9.
von Frowein J, Pagel P, Kappler R, von Schweinitz D, Roscher A, Schmid I. MicroRNA‐492 is processed from the keratin 19 gene and up‐regulated in metastatic hepatoblastoma. Hepatology. 2011;53(3):833–42.
Chatterjee A, Leichter AL, Fan V, Tsai P, Purcell RV, Sullivan MJ, et al. A cross comparison of technologies for the detection of microRNAs in clinical FFPE samples of hepatoblastoma patients. Sci Rep. 2015;5:10438.
Purcell R, Childs M, Maibach R, Miles C, Turner C, Zimmermann A, et al. HGF/c-Met related activation of b-catenin in hepatoblastoma. J Exp Clin Cancer Res. 2011;30:96.
Stockwell PA, Chatterjee A, Rodger EJ, Morison IM. DMAP: differential methylation analysis package for RRBS and WGBS data. Bioinformatics. 2014;30(13):1814–22.
Chatterjee A, Stockwell PA, Horsfield JA, Morison IM, Nakagawa S. Base-resolution DNA methylation landscape of zebrafish brain and liver. Genom Data. 2014;2:342–4.
Chatterjee A, Rodger EJ, Stockwell PA, Weeks RJ, Morison IM. Technical considerations for reduced representation bisulfite sequencing with multiplexed libraries. J Biomed Biotechnol. 2012;2012:741542.
Chatterjee A, Ozaki Y, Stockwell PA, Horsfield JA, Morison IM, Nakagawa S. Mapping the zebrafish brain methylome using reduced representation bisulfite sequencing. Epigenetics. 2013;8(9):979–89.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. Bioinformatics in Action. 2011;17(1).
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.
Friedlander MR, Mackowiak SD, Li N, Chen W, Rajewsky N. miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 2012;40(1):37–52.
Torrezan GT, Ferreira EN, Nakahata AM, Barros BD, Castro MT, Correa BR, et al. Recurrent somatic mutation in DROSHA induces microRNA profile changes in Wilms tumour. Nat Commun. 2014;5:4039.
Zhang X-F, Li K-K, Gao L, Li S-Z, Chen K, Zhang J-B, et al. miR-191 promotes tumorigenesis of human colorectal cancer through targeting C/EBPβ. Oncotarget. 2015;6(6):4144–58.
Huang Z, Huang S, Wang Q, Liang L, Ni S, Wang L, et al. MicroRNA-95 promotes cell proliferation and targets sorting Nexin 1 in human colorectal carcinoma. Cancer Res. 2011;71(7):2582–9.
Yau WL, Lam CSC, Ng L, Chow AKM, Chan STC, Chan JYK, et al. Over-expression of miR-106b promotes cell migration and metastasis in hepatocellular carcinoma by activating epithelial-mesenchymal transition process. PLoS One. 2013;8(3), e57882.
Gyugos M, Lendvai G, Kenessey I, Schlachter K, Halasz J, Nagy P, et al. MicroRNA expression might predict prognosis of epithelial hepatoblastoma. Virchows Arch. 2014;464(4):419–27.
Leichter, AL; Purcell, RV; Sullivan, MJ; Eccles, MR; Chatterjee, A. (2015): Supporting data for "Multi-platform microRNA profiling of hepatoblastoma patients using formalin fixed paraffin embedded archival samples".GigaScience Database. http://dx.doi.org/10.5524/100180
We are grateful to the Children’s Cancer Research Trust, New Zealand for providing funding to support this research. We also express our gratitude to Vicky Fan, Peter Tsai and Dr Aaron Jeffs from New Zealand Genomics Ltd., for their assistance in data analysis. MRE and AC were supported by the University of Otago Leading Thinkers Advancement Campaign, and the New Zealand Institute for Cancer Research Trust (NZICRT), respectively.
The authors declare that they have no competing interests.
AC, MRE, ALL conceptually designed the framework of the manuscript. AC analysed the data with help from ALL, and wrote the first draft of the paper. ALL was responsible for overseeing the experiments performed on the different platforms and helped AC to analyse the data and write the manuscript. RVP and MJS provided additional clinical information for the samples analysed. AC and MRE provided supervision to this work. All authors read and approved the final manuscript.
Supplementary tables, further details of data analysis. (PDF 197 kb)