miRNA Temporal Analyzer (mirnaTA): a bioinformatics tool for identifying differentially expressed microRNAs in temporal studies using normal quantile transformation
© Cer et al.; licensee BioMed Central Ltd. 2014
Received: 8 April 2014
Accepted: 30 September 2014
Published: 13 October 2014
Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one’s own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts.
We have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods.
mirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.
KeywordsmicroRNA miRNA Temporal Analyzer mirnaTA Time series Differential expression DE Quantile normalization Linear model Normal quantile transformation
MicroRNAs (miRNAs) are short single-stranded non-coding RNAs approximately 19–22 nucleotide long, which are critical regulators of gene expression and have been implicated in a wide range of physiological processes, such as apoptosis and growth, as well as pathological processes, including inflammatory responses, cancer, neurodegenerative and cardiovascular diseases[1–7]. This rapid growth is evident by the exponentially increasing number of miRNAs reported in the recent Release 21 (June 2014) of miRBase[8, 9] which contains 35,828 mature miRNA products in 223 different species. Accompanying this growth is the development of many miRNA discovery bioinformatics tools including, but not limited to miRscan, miRFinder, miRDeep and miRanalyzer, to help researchers identify miRNAs from existing miRNA databases and/or predict novel miRNAs from NGS data.
In recent years, the number of miRNA experiments involving multiple time-series has largely increased. This is not surprising since biological processes are often dynamic, and therefore, a cross-sectional study based on a single time point would not provide important information that time-course studies can provide. Particularly in miRNA studies, researchers often implement multiple-time points to capture how certain miRNAs may display transient expression changes over the course of treatment or infection over time. For instance, Jayaswal et al. carried out a drug study involving a multiple myeloma cell line, U266, and consisting of six time points—0, 2, 4, 8, 24, and 48 hours with two biological replicates per time point for both miRNA and mRNA. In another study by Li Z et al., miRNA expression profiles were assessed in the mouse livers in a time-course experiment at days 1, 3, 7, 15, 30 and 120 post treatment.
Despite the fact that there are several publications dealing with the statistical analysis of time-series expression data, there are significant computational challenges in making sense of the massive and complex datasets produced by time-series experiments. There are a number of differential expression (DE) tool packages including STEM, MaTSE, Linear Models for Microarray Data (LIMMA), Significance Analysis of Microarrays (SAM), Extraction of Differential Gene Expression (EDGE), and Bayesian Estimation of Temporal Regulation (BETR). In miRNA research in particular, the two most commonly used DE tools are edgeR and DEseq. However, learning to use these tools requires the expertise of bioinformaticians as they often come with several package dependencies and not every research laboratory employs these specialists nor has advanced computing infrastructure. There is a crucial need for a simple tool that would allow any bench scientist to analyze the differential expression of miRNAs or other subjects in temporal studies.
Development of mirnaTA
Input file requirement
Any datasets resulting from RNA-seq experiments, array-based miRNA profiling or RT-PCR experiments, can be used as input data for the mirnaTA as long as the tab-delimited input files contain two columns, miRNA name and sequence read count of the miRNA. The number of time points that can be submitted to mirnaTA is a minimum of 2 to a maximum of 20. mirnaTA is able to handle to three replicates as follows: if the correlation coefficient (r) value between two replicates is more than or equal to 0.70, the count values from replicates are averaged and used in the study. Incremental correlation is applied for each additional replicate (see User Guide available at for more details). If the r value is less than 0.7, the replicate data is not used in the calculations. The correlation coefficient, r value 0.7 was chosen arbitrarily as it was deemed reasonable for replicates to be considered sufficiently correlated. However, users are able to change this to any value although we do not recommend to use any value below 0.7.
Data normalization by NQT
Identification of differentially expressed miRNAs by a linear regression model
Identification of other differentially expressed miRNA species using a non-linear analysis
Syntax for running mirnaTA
The syntax to run mirnaTA is simple, as follows: perl nmrc_mirnata.pl -i < input_file > -n < x > -t < y>, where nmrc_mirnata.pl is the wrapper Perl script, < input_file > is a tab-delimited, matrix file, x = the number of replicates, y = the number of time points. For example, to run mirnaTA for a dataset with three time points, two replicates and correlation coefficient value of 0.85 (default is 0.7), the syntax would be “perl nmrc_mirnata.pl -i example_datasets/3tp2rep_dataset -n 2 –t 3 –c 0.85”. To demonstrate the functionality of mirnaTA, a comprehensive User Guide with examples using different datasets and expected results are provided at our SourceForge page and also archived in GigaDB.
Visualizations of mirnaTA outputs
mirnaTA is an open-source bioinformatics tool that can be run in Linux, Mac or Windows with Perl and R package dependencies. While there are many other time-series analysis tools available, the main advantages of mirnaTA are:
Simplicity: Even though it is a command line program and users need to be aware of corrections for multiple testing, they need to provide only one input file and enter one line of Perl syntax, and do not need to have a strong statistical or computational background.
Reliability: The statistically significant miRNA species identified by linear regression model have met very stringent criteria, and therefore could be further examined for confirmation or validation in wet laboratories.
Practicality: The mirnaTA only requires a simple input file which can be prepared by anyone, takes up to three replicates and 20 time points, and generates publication quality graphics in PNG format.
Wide applicability: The mirnaTA can be used to analyze any kind of data (gene, weather data, weight, etc.), and does not have to be restricted to miRNA data. All a user has to do is to replace the miRNA name in the first column with another subject of study.
In summary, although mirnaTA does not yet provide researchers with “a push-button” solution for differential expression analysis, it gives them the power to use a simple Perl syntax to not only identify differentially expressed miRNAs, but also automatically generate intuitive graphical outputs. This simple and automatic implementation of previously detached concepts in mirnaTA could be instrumental in saving a significant amount of time for many basic research scientists.
Availability and requirements
Project name: miRNA Temporal Analyzer (mirnaTA)
Project home page:http://sourceforge.net/projects/mirnata
Operating system (s): Linux, Mac and Windows
Programming languag e: Perl and R
Other Requirements: None
License: The GNU Lesser General Public License, version 3.0 (LGPL-3.0)
Any restrictions to use by non-academics: None
Availability of supporting data
Analysis of variance
Cumulative distribution function
Normal quantile transformation
Portable network graphics
Trimmed mean method.
mirnaTA was developed as a part of a study supported by the Defense Threat Reduction Agency (DTRA) project CBM.DIAGB.03.10.NM.028. JJA and VPM are military service members or employees of the U.S. Government and this work was prepared as part of their official duties. Title 17 U.S.C. §105 provides that ‘Copyright protection under this title is not available for any work of the United States Government’. Title 17 U.S.C. §101 defines a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of that person’s official duties. The opinions or assertions contained herein are the private ones of the author (s) and are not to be construed as official or reflecting the views of either the Department of the Navy or the Department of Defense, nor the U.S. Government.
- Kim VN, Han J, Siomi MC: Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol. 2009, 10 (2): 126-139. 10.1038/nrm2632.View ArticlePubMedGoogle Scholar
- Xiao C, Rajewsky K: MicroRNA control in the immune system: basic principles. Cell. 2009, 136 (1): 26-36. 10.1016/j.cell.2008.12.027.View ArticlePubMedGoogle Scholar
- Tili E, Michaille JJ, Calin GA: Expression and function of micro-RNAs in immune cells during normal or disease state. Int J Med Sci. 2008, 5 (2): 73-79.View ArticlePubMedPubMed CentralGoogle Scholar
- Matsushima K, Isomoto H, Inoue N, Nakayama T, Hayashi T, Nakayama M, Nakao K, Hirayama T, Kohno S: MicroRNA signatures in Helicobacter pylori-infected gastric mucosa. Int J Cancer. 2011, 128 (2): 361-370. 10.1002/ijc.25348.View ArticlePubMedGoogle Scholar
- Wang WX, Wilfred BR, Madathil SK, Tang G, Hu Y, Dimayuga J, Stromberg AJ, Huang Q, Saatman KE, Nelson PT: miR-107 regulates granulin/progranulin with implications for traumatic brain injury and neurodegenerative disease. Am J Pathol. 2010, 177 (1): 334-345. 10.2353/ajpath.2010.091202.View ArticlePubMedPubMed CentralGoogle Scholar
- Garza-Manero S, Pichardo-Casas I, Arias C, Vaca L, Zepeda A: Selective distribution and dynamic modulation of miRNAs in the synapse and its possible role in Alzheimer’s disease. Brain Res. 2013, 1584: 80-93.View ArticlePubMedGoogle Scholar
- Ono K, Kuwabara Y, Han J: MicroRNAs and cardiovascular diseases. FEBS J. 2011, 278 (10): 1619-1633. 10.1111/j.1742-4658.2011.08090.x.View ArticlePubMedPubMed CentralGoogle Scholar
- Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M, Matzke M, Ruvkun G, Tuschl T: A uniform system for microRNA annotation. RNA. 2003, 9 (3): 277-279. 10.1261/rna.2183803.View ArticlePubMedPubMed CentralGoogle Scholar
- Kozomara A, Griffiths-Jones S: miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011, 39 (Database issue): D152-D157.View ArticlePubMedGoogle Scholar
- Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP: The microRNAs of Caenorhabditis elegans. Genes Dev. 2003, 17 (8): 991-1008. 10.1101/gad.1074403.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang TH, Fan B, Rothschild MF, Hu ZL, Li K, Zhao SH: MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans. BMC Bioinformatics. 2007, 8: 341-10.1186/1471-2105-8-341.View ArticlePubMedPubMed CentralGoogle Scholar
- An J, Lai J, Lehman ML, Nelson CC: miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res. 2013, 41 (2): 727-737. 10.1093/nar/gks1187.View ArticlePubMedGoogle Scholar
- Hackenberg M, Rodriguez-Ezpeleta N, Aransay AM: miRanalyzer: an update on the detection and analysis of microRNAs in high-throughput sequencing experiments. Nucleic Acids Res. 2011, 39 (Web Server issue): W132-W138.View ArticlePubMedPubMed CentralGoogle Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A: NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013, 41 (Database issue): D991-D995.View ArticlePubMedGoogle Scholar
- Jayaswal V, Lutherborrow M, Ma DD, Hwa Yang Y: Identification of microRNAs with regulatory potential using a matched microRNA-mRNA time-course data. Nucleic Acids Res. 2009, 37 (8): e60-10.1093/nar/gkp153.View ArticlePubMedPubMed CentralGoogle Scholar
- Li Z, Branham WS, Dial SL, Wang Y, Guo L, Shi L, Chen T: Genomic analysis of microRNA time-course expression in liver of mice treated with genotoxic carcinogen N-ethyl-N-nitrosourea. BMC Genomics. 2010, 11: 609-10.1186/1471-2164-11-609.View ArticlePubMedPubMed CentralGoogle Scholar
- Ernst J, Bar-Joseph Z: STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics. 2006, 7: 191-10.1186/1471-2105-7-191.View ArticlePubMedPubMed CentralGoogle Scholar
- Craig P, Cannon A, Kukla R, Kennedy J: MaTSE: the gene expression time-series explorer. BMC Bioinformatics. 2013, 14: S1.View ArticlePubMedPubMed CentralGoogle Scholar
- Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article 3-Epub 2004 Feb 12Google Scholar
- Grace C, Nacheva EP: Significance Analysis of Microarrays (SAM) offers clues to differences between the genomes of adult philadelphia positive ALL and the lymphoid blast transformation of CML. Cancer Inform. 2012, 11: 173-183.View ArticlePubMedPubMed CentralGoogle Scholar
- Leek JT, Monsen E, Dabney AR, Storey JD: EDGE: extraction and analysis of differential gene expression. Bioinformatics. 2006, 22 (4): 507-508. 10.1093/bioinformatics/btk005.View ArticlePubMedGoogle Scholar
- Aryee MJ, Gutierrez-Pabello JA, Kramnik I, Maiti T, Quackenbush J: An improved empirical bayes approach to estimating differential gene expression in microarray time-course data: BETR (Bayesian Estimation of Temporal Regulation). BMC Bioinformatics. 2009, 10: 409-10.1186/1471-2105-10-409.View ArticlePubMedPubMed CentralGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26 (1): 139-140. 10.1093/bioinformatics/btp616.View ArticlePubMedGoogle Scholar
- Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11 (10): R106-10.1186/gb-2010-11-10-r106.View ArticlePubMedPubMed CentralGoogle Scholar
- Garmire LX, Subramaniam S: Evaluation of normalization methods in mammalian microRNA-Seq data. RNA. 2012, 18 (6): 1279-1288. 10.1261/rna.030916.111.View ArticlePubMedPubMed CentralGoogle Scholar
- Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010, 11: 94-10.1186/1471-2105-11-94.View ArticlePubMedPubMed CentralGoogle Scholar
- McCormick KP, Willmann MR, Meyers BC: Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments. Silence. 2011, 2 (1): 2-10.1186/1758-907X-2-2.View ArticlePubMedPubMed CentralGoogle Scholar
- Dalgaard P: Statistics and Computing: Introductory Statistics with R. 2002, New York: Springer ScienceGoogle Scholar
- Cer RZ, Herrera-Galeano JE, Anderson JJ, Bishop-Lilly KA, Mokashi VP: Software and Supporting Material for: “mirnaTA: a Bioinformatics Tool for Identifying Differentially Expressed microRNAs in Temporal Studies using Normal Quantile Transformation (NQT)”. 2014, GigaScience Database,http://dx.doi.org/10.5524/100107,Google Scholar
- Caraux G, Pinloche S: PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order. Bioinformatics. 2005, 21 (7): 1280-1281. 10.1093/bioinformatics/bti141.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.