Large-scale analysis of the evolutionary histories of phosphorylation motifs in the human genome
© Yoshizaki and Okuda; licensee BioMed Central. 2015
Received: 1 October 2014
Accepted: 1 April 2015
Published: 6 May 2015
Protein phosphorylation is a post-translational modification that is essential for a wide range of eukaryotic physiological processes, such as transcription, cytoskeletal regulation, cell metabolism, and signal transduction. Although more than 200,000 phosphorylation sites have been reported in the human genome, the physiological roles of most remain unknown. In this study, we provide some useful datasets for the assessment of functional phosphorylation signaling using a comparative genome analysis of phosphorylation motifs.
We described the evolutionary patterns of conservation of these and comparative genomic data for 93,101 phosphosites and 1,003,756 potential phosphosites in human phosphomotifs, using 178 phosphomotifs identified in a previous study that occupied 69% of known phosphosites in public databases. Comparative genomic analyses were performed using genomes from nine species from yeast to humans. Here we provide an overview of the evolutionary patterns of phosphomotif acquisition and indicate the dependence on motif structures. Using the data from our previous study, we describe the interaction networks of phosphoproteins, identify the kinase substrates associated with phosphoproteins, and perform gene ontology enrichment analyses. In addition, we show how this dataset can help to elucidate the function of phosphomotifs.
Our characterizations of motif structures and assessments of evolutionary conservation of phosphosites reveal physiological roles of unreported phosphosites. Thus, interactions between protein groups that share motifs are likely to be helpful for inferring kinase-substrate interaction networks. Our computational methods can be used to elucidate the relationships between phosphorylation signaling and cellular functions.
KeywordsPhosphorylation motif Comparative evolutionary analysis Kinase
Utility of the dataset
Protein phosphorylation has an important role in a wide variety of cellular functions , and previous large-scale mass spectrometry studies have identified >100,000 phosphosites [2,3]. These phosphosites mostly represent modifications with unknown physiological functions, precluding identification of which ones are physiologically important. Nonetheless, 518 protein kinases have been reported in the human genome and, because various kinases are targeted to specific sequence motifs in the surrounding regions of phosphosites, such phosphorylation motifs have been extensively characterized . Here, we have determined the functions of phosphorylation signaling pathways in cellular processes. We have also investigated the relationships between 178 phosphomotifs and cellular functions, and evolutionary conservation . Our analyses indicate that highly conserved phosphomotifs are likely to be involved in similar signaling networks with functionally important roles. We describe the sequences and evolutionary conservation of 93,101 known phosphosites and 1,003,756 potential phosphosites from the human genome (Additional file 1 and Raw_Data_All_Motif_Seq.txt in GigaDB ). This information is expected to be helpful for linking phosphorylation signaling networks to physiological functions and for assessing functional importance. Therefore, we provide information about the kinases that phosphorylate them, the interaction networks of proteins with the same motif, and we find the associations between the motifs and the biological functions. We show that this dataset can help to elucidate the function of phosphomotifs and their role in cellular signaling by showing how they evolved. Furthermore, we show information about the evolutionary conservation of phosphosites with known kinase-substrate relationships, and the ortholog conservation of each kinase. Finally, we show that the evolutionary conservation of phosphomotifs is not likely to be correlated with the ortholog conservation of the kinases.
Definition of the phosphomotif conservation index
where G denotes the set of genomes used in the study, q denotes the index of a genome selected from G, and Cq and Rq are the conservation and reference conservation rates in q, respectively.
Evolutionary conservation and expansion of kinases
Relationships between phosphorylation motifs and protein kinases
Number of substrates
A previous report has shown that numbers of CMGC kinases have occurred during the early evolution of vertebrates . Thus, to investigate the correlations of the CMGC kinase substrate conservation with the evolutionary expansion of kinases, changes in the numbers of kinases in the kinase groups were calculated using orthologs of kinases defined in KEGG. The proportion of AGC and CMGC kinases among all of the kinases did not differ between human and worm genomes (Figure 3B and Additional file 6), suggesting that numbers of AGC and CMGC kinases increased in vertebrates. Hence, conservation of phosphosites may reflect the types of kinases rather than the evolutionary changes in the numbers of expressed kinases. Thus, to facilitate the development of phosphomotif prediction tools, such as Scansite and Netphorest [19,20], we determined the evolutionary conservation of these phosphosites and defined the kinase-substrate relationships, CIs for each kinase family, and kinase orthologs.
Interaction networks of proteins with assigned phosphorylation motifs
Interactions between proteins with the same motif were more likely than reconstructed interactions between randomly selected proteins, allowing enrichment of proteins with similar physiological functions . Hence, identification of protein networks with the same motifs may facilitate characterizations of phosphorylation interaction networks based on kinase-substrate relationships and may be used to determine the ensuing physiological functions. To identify the associations between motif-associated proteins, data describing intermolecular interactions were downloaded from BioGRID (2.0.58)  and STRING (v8.2) , and the interactions of proteins with known motifs were extracted. The free open-source software application Cytoscape  was then used to visualize and analyze networks  and to construct network visualizations of our data (phospho-signal_network.cys in GigaDB ). Network visualization required the use of Cytoscape version 3.0 or above.
Gene ontology enrichment analysis
We have previously identified  the likely functional correlations between a wide variety of phosphomotifs, warranting the characterization of phosphomotifs with functional categories, such as gene ontology (GO), to confirm the physiological functions of the ensuing phosphorylation signaling. Thus, correlations between extracted phosphorylation motifs and specific physiological protein functions were identified here using functional enrichment analysis based on GO (Additional file 7). In these analyses, GO annotations were extracted for human proteins with known phosphorylation sites. Subsequently, annotations at the known motif level were assigned on the basis of the GO biological processes for proteins with the motif, and motif functions were identified using enrichment analysis with GoMiner . Significant GO annotations were extracted with cutoffs of FDR = 0.01 and p < 0.01.
Availability of supporting data
Datasets supporting the results of this study are available in the GigaScience repository, GigaDB . The data derived from PhosphoSitePlus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Kyoto Encyclopedia of Genes and Genomes
Search Tool for Recurring Instances of Neighboring Genes
We would like to thank Toshiya Hayano and Etsuko Kiyokawa for their help and discussion. This work was supported by a Grant-in-Aid for Young Scientists (A) and (B) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) (Research Project Number:26700029 and 24770190), Uehara Memorial Foundation Fellowship, and the Program for Research of Young Scientists from Ritsumeikan University.
- Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science. 2002;298(5600):1912–34.View ArticlePubMedGoogle Scholar
- Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, Jorgensen C, Miron IM, et al. Systematic discovery of in vivo phosphorylation networks. Cell. 2007;129(7):1415–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, et al. PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol. 2007;8(11):R250.View ArticlePubMedPubMed CentralGoogle Scholar
- Ubersax JA, Ferrell Jr JE. Mechanisms of specificity in protein phosphorylation. Nat Rev Mol Cell Biol. 2007;8(7):530–41.View ArticlePubMedGoogle Scholar
- Yoshizaki H, Okuda S. Elucidation of the evolutionary expansion of phosphorylation signaling networks using comparative phosphomotif analysis. BMC Genomics. 2014;15(1):546.View ArticlePubMedPubMed CentralGoogle Scholar
- Yoshizaki H, Okuda S: Supporting data and materials for “Large-scale analysis of evolutionary histories of phosphorylation motifs in the human genome”. GigaScience Database 2015, http://doi.org/10.5524/100136.
- Beltrao P, Albanese V, Kenner LR, Swaney DL, Burlingame A, Villen J, et al. Systematic functional prioritization of protein posttranslational modifications. Cell. 2012;150(2):413–25.View ArticlePubMedPubMed CentralGoogle Scholar
- Minguez P, Parca L, Diella F, Mende DR, Kumar R, Helmer-Citterich M, et al. Deciphering a global network of functionally associated post-translational modifications. Mol Syst Biol. 2012;8:599.View ArticlePubMedPubMed CentralGoogle Scholar
- Gnad F, Gunawardena J, Mann M. PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Res. 2011;39(Database issue):D253–60.View ArticlePubMedGoogle Scholar
- Lee TY, Huang HD, Hung JH, Huang HY, Yang YS. Wang TH: dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. 2006;34(Database issue):D622–7.View ArticlePubMedGoogle Scholar
- Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, et al. Phospho.ELM: a database of phosphorylation sites–update. Nucleic Acids Res 2011. 2011;39(Database issue):D261–7.View ArticleGoogle Scholar
- Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, et al. Human Protein Reference Database–2009 update. Nucleic Acids Res. 2009;37(Database issue):D767–72.View ArticlePubMedGoogle Scholar
- PhosphoSitePlus. http://www.phosphosite.org/homeAction.do.
- Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 2012;40(Database issue):D261–70.View ArticlePubMedGoogle Scholar
- Nakaya A, Katayama T, Itoh M, Hiranuka K, Kawashima S, Moriya Y, et al. KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters. Nucleic Acids Res. 2013;41(Database issue):D353–7.View ArticlePubMedGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Landry CR, Levy ED, Michnick SW. Weak functional constraints on phosphoproteomes. TIG. 2009;25(5):193–7.View ArticlePubMedGoogle Scholar
- Li M, Liu J, Zhang C. Evolutionary history of the vertebrate mitogen activated protein kinases family. PLoS One. 2011;6(10), e26999.View ArticlePubMedPubMed CentralGoogle Scholar
- Obenauer JC, Cantley LC, Yaffe MB. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 2003;31(13):3635–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, et al. Linear motif atlas for phosphorylation-dependent signaling. Sci Signal. 2008;1(35):ra2.View ArticlePubMedGoogle Scholar
- Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 2013;41(Database issue):D816–23.View ArticlePubMedGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39(Database issue):D561–8.View ArticlePubMedGoogle Scholar
- Cytoscape. http://www.cytoscape.org/.
- Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, et al. A travel guide to Cytoscape plugins. Nat Methods. 2012;9(11):1069–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003;4(4):R28.View ArticlePubMedPubMed CentralGoogle Scholar
- KEGG BRITE Database. http://www.genome.jp/kegg/brite_ja.html.
- KEGG SSDB Database. http://www.kegg.jp/kegg/ssdb/.
- Lehmann S, Bass JJ, Szewczyk NJ. Knockdown of the C. elegans kinome identifies kinases required for normal protein homeostasis, mitochondrial network structure, and sarcomere structure in muscle. CCS. 2013;11:71.PubMedPubMed CentralGoogle Scholar
- Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, et al. WormBase 2014: new views of curated biology. Nucleic Acids Res. 2014;42(Database issue):D789–93.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.