This article has Open Peer Review reports available.
Genome sequence of the olive tree, Olea europaea
- Fernando Cruz†1, 2,
- Irene Julca†2, 3, 4,
- Jèssica Gómez-Garrido1, 2,
- Damian Loska2, 3,
- Marina Marcet-Houben2, 3,
- Emilio Cano5,
- Beatriz Galán6,
- Leonor Frias1, 2,
- Paolo Ribeca1, 2,
- Sophia Derdak1, 2,
- Marta Gut1, 2,
- Manuel Sánchez-Fernández7,
- Jose Luis García6,
- Ivo G. Gut1, 2,
- Pablo Vargas5, 11Email author,
- Tyler S. Alioto1, 2, 10Email author and
- Toni Gabaldón2, 3, 8, 9Email author
© The Author(s). 2016
Received: 6 April 2016
Accepted: 6 June 2016
Published: 27 June 2016
The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n).
A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %.
The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.
Genomic DNA was extracted from leaf tissue of a single Mediterranean olive tree (Olea europaea L. subsp. europaea var. europaea cv. 'Farga'; NCBI Taxonomy ID: 158383). This tree, named ‘Santander’, was translocated from the Maestrazgo region (Eastern Spain) to Boadilla del Monte (Madrid, Spain) in 2005. O. europaea is a common tree in Spain and there are no legal restrictions on its use for research, including cv. Farga.
The tree age was estimated to be 1,200 years old based on dendrometric analyses (Antonio Prieto-Rodríguez personal communication). A combination of fosmid and whole genome shotgun (WGS) libraries were sequenced using Illumina sequencing equipment.
The standard Illumina protocol was followed, with minor modifications to create short-insert paired-end (PE) libraries (Illumina Inc., Cat. # PE-930–1001), which were run on different types of Illumina sequencers (MiSeq 2×250, 2×300, 2×500, 1×600 and HiSeq2500 2×150) according to standard procedures. The MiSeq XL modes (2×500 and 1×600) were carried out according to the MiSeq modifications reported in  and with the technical support of Illumina.
Sequencing libraries and respective yields used for whole genome shotgun sequencing and fosmid pools
1FP to 96FP
Summary statistics of the Oe6 assembly
RNA was prepared from seven different tissues or developmental stages (root, young leaf, mature leaf, flower, flower bud, immature fruit, and green olives), using the Zymo ZR Plant RNA extraction kit (Zymo Research, Irvine, CA). Then, RNA-Seq libraries were prepared using the TruSeq™ RNA Sample Prep Kit v2 (Illumina Inc.) with minor modifications, and libraries were sequenced using the TruSeq SBS Kit v3-HS in PE mode with a read length of 2×75 bp. Over 50 million PE reads per sample were generated in a fraction of a sequencing lane on a HiSeq2000 (Illumina Inc.), following the manufacturer’s protocol. Image analysis, base calling and run quality scoring were processed using the manufacturer’s software Real Time Analysis (RTA 1.13.48), followed by generation of FASTQ sequence files using CASAVA software (Illumina Inc.).
A kmer analysis was performed to estimate the genome size, level of heterozygosity and repeat content of the sequenced genome. Using the software Jellyfish v1.1.10 , 17-mers were extracted from the WGS PE reads (PE400), and unique kmers were counted and plotted according to kmer depth (Fig. 1). The homozygous or main peak is found at a depth of ~52x. The estimated genome size (found by dividing the total number of kmers by the kmer depth of the main peak) is 1.38 Gb, which is at the low end of the range of empirical estimates. The C-value ranges from 1.45–2.33 pg (1.42 Gb–2.28 Gb), with the median at 1.59 pg (1.56 Gb) (data from , see [5–9]), suggesting the existence of variation in the repetitive fraction of the genome for the species. The left peak at 26x kmer depth indicates many polymorphic sites in the genome. In fact, using the Genomic Character Estimator program, gce v 1.0.0 , the heterozygous ratio based in kmer individuals is 0.054, and the corrected estimate of genome size is 1.32 Gb. Hereon the gce estimate is referred to as the ‘assemblable’ portion of the genome.
A pilot WGS assembly using only PE data was performed in order to generate enough contiguous sequences to gather library insert size statistics. PE reads were first filtered for contaminating sequences (phiX, Escherischia coli and other vector sequences, as well as O. europaea plastids) using GEM  with –m 0.02 (2 % mismatches). Then, the reads were assembled into scaffolds using AbySS v1.3.6  with parameters: −s 600 − S 600–3000 − n 6 − N 10 − k 127 − l 75 − aligner map − q 10. This resulted in an assembly with a total length of 1.94 Gb, and contig and scaffold N50s of 3.7 kb and 3.8 kb, respectively. Library insert sizes were estimated by mapping against this draft assembly. For the WGS PE libraries sequenced on Illumina HiSeq2000 using 2x151 bp reads, the insert size distribution followed a bimodal distribution with a main peak at 725 bp and a smaller peak at 300 bp. Before continuing with the assembly, read pairs belonging to the smaller peak were filtered out, if connecting reads were found overlapping both mates of the pair.
To increase the overall completeness of the assembly, all WGS reads that did not map to the FP assembly were selected and used to obtain a complementary assembly using ABySSv1.5.2 with parameters: −s 300 − S 300–5000 − n 10 − N 10 − k 95 − l 75 − aligner map − q 10. This assembly accounts for 60.7 Mbp of sequence, and has an N50 of 1,506 bp for contigs and 2,351 bp for scaffolds. This assembly was then broken into contigs, 50 bp was eroded from the ends of each contig, then contigs smaller than 200 bp were filtered out. Both assemblies were subsequently gathered by joining the WGS contigs with the merged fosmid pool assembly, and scaffolding them with SSPACE 2.0 . To account for read pairs coming from two different alleles in the same genomic region, reads were mapped to the SSPACE input assembly with gem-mapper (settings: m = 0.05 and e = 0.1) and filters were applied to detect unique mappings with no subdominant match. The resulting comprehensive assembly had a scaffold N50 of 303.7 kb and a total length of 1.51 Gb, ~190 Mb above the expected genome length (1.32 Gb). The excess of assembled sequence is likely to be caused by the presence of artificial duplications during the assembly process (i.e., uncollapsed haplotypes that have been resolved in two different contigs). Several strategies were used to refine the assembly and obtain a haploid reference. First, consistency check was applied to remove local misassemblies by mapping short and intermediate libraries (PE720, MP3k and MP5k) to the input assembly: a positive score is assigned to the assembly regions supported by read pairs separated by distances falling within the limits (mean ± 3σ) of the empirical distribution, while a negative score is assigned to regions where read pairs map i) outside of these bounds, ii) in inconsistent orientation, or iii) to different scaffolds. Regions where the sum of these two vectors is negative are removed from the assembly. After applying this consistency check, the resulting assembly had 46,893 consistent contig blocks (compared to 25,042 contigs before the consistency check), giving a total of 1.46 Gb with an N50 of 101 kb. Second, this assembly was collapsed using a minimum overlap of 4 kb and the gem-mapper parameters − e 0.03 and − m 0.02, so only close matches were merged (similar uncollapsed haplotypes, identical assembly artifacts, and near identical repeats). Additionally, in order to avoid spurious joins, tip merging was applied to the alignment graph down to overlaps of 250 bp. Finally, no repeat resolution was applied, but coherent links from input scaffolds were reinserted. Consequently, the assembly length shrunk to ~1.30 Gb, almost matching the assemblable fraction of the genome (1.32 Gb). An additional consistency check was run on the collapsed assembly using the short and intermediate libraries (PE720, MP3k and MP5k), which resulted in breaking the assembly from 64,814 into 72,593 scaffolds, giving a total length of 1.30 Gb with a scaffold N50 of 50 kb. This assembly length is what was expected based on the gce estimate. As a final assembly step, PE reads with high divergence (gem-mapper parameters m = 0.05 and e = 0.08) were mapped to the assembly and rescaffolded with SSPACE 2.0 using parameters k = 3 and a = 0.6. Then, scaffolds shorter than 500 bp were discarded, and the GapFiller program  was used to close about 40 % of the assembly gaps. This assembly was labeled ‘Oe3’.
The Oe3 assembly was polished using a mapping-based strategy designed to correct single nucleotide substitution and short insertion–deletion errors. First, one library of paired-end reads (PE725) was aligned using BWA mem (v0.7.7)  and variant calling was performed. Selecting only homozygous alternative variants, an alternative FASTA sequence was obtained using GATK (v3.5) FastaAlternateReferenceMaker . After discarding scaffolds shorter than 500 bp, the resulting assembly (Oe5) had a scaffold N50 of 444 kb and a contig N50 of 51 kb. After detecting putative contamination in some scaffolds of the Oe5 assembly, a final decontamination step was performed against yeast, bacteria, arthropod and mitochondrial sequences, combining homology search results obtained by BLAST and, in the case of mitochondrial sequences, regions of high depth (~6000x). In total, 509 scaffolds were deleted from Oe5 and some parts of another 27 scaffolds were removed. The assembly resulting from this step, Oe6, has a scaffold N50 of 443 kb and a contig N50 of 52 kb (Table 2). Oe6 contains 48,419 gaps comprising 53,969,601 sites. The gene completeness of this assembly was estimated using CEGMA  and BUSCO (Benchmarking Universal Single-Copy Orthologs) . CEGMA analysis resulted in a gene completeness of 98.79 %, while BUSCO, using a plant-specific database of 956 genes, determined a completeness of 95.6 % of plant genes. A summary of the complete assembly strategy is shown in Fig. 4.
Partial assembly of an olive tree associated fungus: Aureobasisium pullulans
One of the putative sources of non-plant sequence present in the olive samples was considered of interest; it was represented among the fosmid pools and seemed to belong to the fungal genus Aureobasidium, which has been previously associated with olive trees . To assemble a partial sequence of this genome, four fully sequenced Aureobasidium genomes were downloaded from JGI . Then, BWA v0.7.3a  was used to map all the reads from the fosmid libraries to the four genomes. Once mapped, the reads were filtered allowing only soft clipping for a maximum of one-third of the read, and deleting read pairs when only one of the pairs passed the filters. This resulted in a collection of 18,549,090 reads, which were assembled with SPAdes v.3.1.1 . Scaffolding was done using the assembled fosmids using SSPACE-LongRead , and gaps were filled with gapcloser . These two steps were repeated twice. The final alignment was then compared to the Aureobasidium genomes using BLAST. Contigs longer than 200 nt, for which less than 20 % of their sequence mapped against any of the Aureobasidium genomes, were separated and compared against the NCBI non-redundant nucleotide database . Only those contigs with first hits to fungal species were kept. The final assembly comprised 18 Mb, roughly two-thirds of the typical size of Aureobasidium genomes (25–29 Mb). To identify the species and strain, the most common fungal markers used for fungal barcoding were identified (ITS, SSU, LSU, RPB1, RPB2 and EF1). Most of the markers were missing in the assembly or were too short; based on a 769 nt fragment of the RPB1 gene, the most similar sequence was that of Aureobasidium pullulans isolate AFTOL-ID 912 (DQ471148.1); a strain that was isolated from the grape plant Vitis vinifera. The identity of this fragment was 99.95 % indicating that this was likely a different strain of the same species. Augustus  was used to perform gene annotation. The training parameters were obtained using scaffold 1 of the published A. pullulans genome, and then used to predict proteins in our strain of A. pullulans. This resulted in 6,411 proteins.
Olive tree genome annotation
RNA-Seq samples used for annotation
Fruits, leaves, stems and seeds
Picula x Arbequina
For ab initio gene prediction, transposable element repeats in the Oe6 assembly were first masked with RepeatMasker v4-0-5  using a custom repeat library constructed by running RepeatModeler v1-0-7 and adding some olive-specific repeats . A search was also carried out for masked proteins encoded by transposable elements (TEs) provided in the RepeatMasker Library of TE proteins. Low complexity repeats were left unmasked for this purpose. In total, 63 % of the assembly was masked.
On this masked assembly four different ab initio gene predictors were run, since combiners like EvidenceModeler work better when finding consensus among the output of a diverse set of gene prediction algorithms, and orthogonal evidence such as transcript and protein mapping. O. europaea protein-coding gene predictions were obtained with GeneID v1.4.4  trained specifically for O. europaea with GeneidTrainer using the training set of 589 genes; with Augustus v3.0.2  trained with the etraining script that comes with Augustus using the same training set; and with GlimmerHMM v3.0.1  trained with the trainGlimmerHMM script that comes with the program using the same training set. Finally, GeneMark-ES v2.3  gene predictions were obtained by running it in its self-trained mode. The number of predicted gene models ranged from 48,237 with GeneMark-ES to 97,542 with GlimmerHMM. Geneid, Augustus and Genemark-ET v4.21 were also used to generate predictions incorporating intron evidence, which was extracted from the RNA-Seq data, by obtaining the junctions after mapping it with GEM (see below). Junctions overlapping with ab initio GeneID predictions, Augustus predictions, or with protein mappings were taken as intron evidence. Running GeneID with hints resulted in a total set of 74,231 gene models; Augustus with hints resulted in 70,906; and Genemark-ET with 64,329 gene models.
Weights given to each source of evidence when running Evidence Modeler r2012-06-25
Type of evidence
Comparison of O. europaea with other plant species
Number of proteins
Average transcript length (bp)
Average coding sequence length (bp)
Average exons per transcript
Average exon length (bp)
Proteins with homologs in O. europaea
O. europaea proteins with homologs in the other species
56,349 (100 %)
56,349 (100 %)
23,106 (65.3 %)
32,796 (58.2 %)
24,373 (76.5 %)
42,458 (75.3 %)
27,778 (76.8 %)
38,448 (68.2 %)
21,990 (78.5 %)
37,264 (66.1 %)
The increased number of coding genes in O. europaea suggests the existence of a large-scale genome duplication with respect to the other species. Although this possibility deserves more detailed analysis, preliminary analyses of gene comparisons identified 34,195 O. europaea genes with O. europaea paralogs that are more similar to each other than to the corresponding best hit in E. guttata (80.5 % of the total proteins with hits in E. guttata), the closest species in this analyses. Also, from the 14,437 paralogous pairs found in O. europaea that represent each other’s reciprocal best hit, 10,711 pairs had the same best hit in E. guttata (which represents 74.2 % of the pairs). These results suggest that a high proportion of the O. europaea gene repertoire has been duplicated since the separation of these two lamiales species. To discard the possibility that these duplicates resulted from uncollapsed heterozygous alleles, heterozygous single nucleotide variants (SNVs) identified by variant calling using samtools mpileup in pairs of putatively recent duplicates were counted and compared with those in singletons (genes without recent paralogs). The mean is significantly higher in genes within recent duplicate pairs (Welch’s Two Sample t-test p-value < 2.2e-16). Finally, the 70 % quantile of two-copy SNV counts is 42 and 8 for the one-copy genes. In the case where uncollapsed (duplicated) alleles are frequent, one would expect to obtain the opposite pattern, as reads coming from the same locus would independently map to one of the two uncollapsed haplotypes in the assembly, thus dramatically reducing the number of heterozygous SNVs called. Although further and more detailed analyses are required, these results suggest extensive gene duplication in the lineage leading to the olive tree. The possibility of a whole genome duplication is consistent with the increased chromosomal number in O. europaea (2n = 46), as compared to closely related lamiales such as Erythranthe guttata (2n = 28)  and Sesamum indicum (2n = 26) .
Non-coding RNAs (ncRNAs) were annotated by running the following steps. First, the program cmsearch (v1.1) that comes with Infernal  was run with the Rfam database of RNA families (v12.0) . Also, tRNAscan-SE (v1.23)  was run in order to detect the transfer RNA genes present in the genome assembly. To detect long non-coding RNAs (lncRNAs), PASA assemblies that had not been included in the annotation of protein-coding genes (i.e., expressed genes that were not translated to protein) were first selected. Those longer than 200 bp and with a length not covered by a small ncRNA at least 80 % were incorporated into the ncRNA annotation as lncRNAs. The resulting transcripts were clustered into genes using shared splice sites or significant sequence overlap as criteria for designation as the same gene. Systematic identifiers with the prefix ‘OE6ncA’ were assigned to the genes and their derived transcripts. In total, 25,199 non-coding genes have been annotated, among which 20,082 are lncRNAs.
In summary, we report the first genome sequencing, assembly, and annotation of the Mediterranean olive tree. This genome assembly will provide a valuable resource for studying developmental and physiological processes, investigating the past history of domestication, and improving the molecular breeding of this economically important tree.
CDS, coding sequence(s); ENA, European Nucleotide Archive; EST, expressed sequence tag; EVM, Evidence Modeler r2012-06-25; FP, fosmid pools; Gb, gigabase; GO, Gene Ontology; lncRNA, long non-coding RNA; MP, mate-pairs; ncRNA, non-coding RNA; PASA, Program to Assemble Spliced Alignment; PE, paired-end; pg, picograms; SNV, single nucleotide variant; SRA, Sequence Read Archive; TE, transposable element; UTR, untranslated region; WGS, Whole Genome Shotgun
This project was funded by Banco Santander, which provided plant material and financed whole genome sequencing of the olive tree. The authors especially want to thank the late Mr. Emilio Botín for his support in driving this project. IJ was supported by a grant from the Peruvian Ministry of Education: ‘Beca Presidente de la República’ (2013-III).
Availability of supporting data
Supporting data are available in the GigaDB database , and the raw data were deposited in the European Nucleotide Archive (ENA) with the project accession PRJEB4992 (ERP004335) for the Olive genome and PRJNA315541 (LVWM00000000) for the A. pullulans partial genome. In addition, the genome and annotation can be accessed and browsed at .
PV, TSA, IGG and TG conceived the project. EC and PV collected the samples and extracted the genomic DNA. BG and JLC constructed the fosmid pools. MSF provided materials and advice. TG, TSA, FC, IJ, MMH, DL, JGG, LF, PR, MG and IGG performed the genome analyses. PV, TSA and TG wrote the article. All authors discussed the project and data. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA, Yuen MM, Keeling CI, Brand D, Vandervalk BP, Kirk H, Pandoh P, Moore RA, Zhao Y, Mungall AJ, Jaquish B, Yanchuk A, Ritland C, Boyle B, Bousquet J, Ritland K, Mackay J, Bohlmann J, Jones SJM. Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics. 2013;29:1492–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang G, Fang X, Guo X, Li L, Luo R, Xu F, Yang P, Zhang L, Wang X, Qi H, Xiong Z, Que H, Xie Y, Holland PWH, Paps J, Zhu Y, Wu F, Chen Y, Wang J, Peng C, Meng J, Yang L, Liu J, Wen B, Zhang N, Huang Z, Zhu Q, Feng Y, Mount A, Hedgecock D, et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature. 2012;490:49–54.View ArticlePubMedGoogle Scholar
- Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Plant DNA C-values Database. http://data.kew.org/cvalues/. Accessed 15 January 2016.
- Bitonti MB, Cozza R, Chiappetta A, Contento A, Minelli S, Ceccarelli M, Gelati MT, Maggini F, Baldoni L, Cionini PG. Amount and organization of the heterochromatin in Olea europaea and related species. Heredity (Edinb). 1999;83(Pt 2):188–95.View ArticleGoogle Scholar
- Brito G, Loureiro J, Lopes T, Rodriguez E, Santos C. Genetic characterisation of olive trees from Madeira Archipelago using flow cytometry and microsatellite markers. Genet Resour Crop Evol. 2007;55:657–64.View ArticleGoogle Scholar
- Loureiro J, Rodriguez E, Costa A, Santos C. Nuclear DNA content estimations in wild olive (Olea europaea L. ssp. europaea var. sylvestris Brot.) and Portuguese cultivars of O. europaea using flow cytometry. Genet Resour Crop Evol. 2006;54:21–5.View ArticleGoogle Scholar
- Loureiro J, Rodriguez E, Dolezel J, Santos C. Two new nuclear isolation buffers for plant DNA flow cytometry: a test with 37 species. Ann Bot. 2007;100:875–88.View ArticlePubMedPubMed CentralGoogle Scholar
- Ohri D, Bhargava A, Chatterjee A. Nuclear DNA amounts in 112 species of tropical hardwoods -- new estimates. Plant Biol (Stuttg). 2004;6:555–61.View ArticleGoogle Scholar
- gce v 1.0.0. ftp://ftp.genomics.org.cn/pub/gce.
- Marco-Sola S, Sammeth M, Guigó R, Ribeca P. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods. 2012;9:1185–8.View ArticlePubMedGoogle Scholar
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19:1117–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol. 2012;13:R56.View ArticlePubMedPubMed CentralGoogle Scholar
- Frias L, Ribeca P: ASM scripts. https://github.com/lfrias81/anchor-asm/tree/master/wrapper.
- Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–9.View ArticlePubMedGoogle Scholar
- Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.View ArticlePubMedPubMed CentralGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.View ArticlePubMedPubMed CentralGoogle Scholar
- Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–7.View ArticlePubMedGoogle Scholar
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2.View ArticlePubMedGoogle Scholar
- Abdelfattah A, Li Destri Nicosia MG, Cacciola SO, Droby S, Schena L. Metabarcoding Analysis of Fungal Diversity in the Phyllosphere and Carposphere of Olive (Olea europaea). PLoS One. 2015;10:e0131069.View ArticlePubMedPubMed CentralGoogle Scholar
- Gostinčar C, Ohm RA, Kogej T, Sonjak S, Turk M, Zajc J, Zalar P, Grube M, Sun H, Han J, Sharma A, Chiniquy J, Ngan CY, Lipzen A, Barry K, Grigoriev IV, Gunde-Cimerman N. Genome sequencing of four Aureobasidium pullulans varieties: biotechnological potential, stress tolerance, and description of new species. BMC Genomics. 2014;15:549.View ArticlePubMedPubMed CentralGoogle Scholar
- Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.View ArticlePubMedPubMed CentralGoogle Scholar
- Boetzer M, Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics. 2014;15:211.View ArticlePubMedPubMed CentralGoogle Scholar
- Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu S-M, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam T-W, Wang J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1:18.View ArticlePubMedPubMed CentralGoogle Scholar
- Blast-NCBI. https://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessed March 2016.
- Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B: AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 2006, 34(Web Server issue):W435–9.Google Scholar
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–66.View ArticlePubMedPubMed CentralGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Galla G, Barcaccia G, Ramina A, Collani S, Alagna F, Baldoni L, Cultrera NGM, Martinelli F, Sebastiani L, Tonutti P. Computational annotation of genes differentially expressed along olive fruit development. BMC Plant Biol. 2009;9:128.View ArticlePubMedPubMed CentralGoogle Scholar
- Schilirò E, Ferrara M, Nigro F, Mercado-Blanco J. Genetic responses induced in olive roots upon colonization by the biocontrol endophytic bacterium Pseudomonas fluorescens PICF7. PLoS One. 2012;7, e48646.View ArticlePubMedPubMed CentralGoogle Scholar
- Bazakos C, Manioudaki ME, Therios I, Voyiatzis D, Kafetzopoulos D, Awada T, Kalaitzis P. Comparative transcriptome analysis of two olive cultivars in response to NaCl-stress. PLoS One. 2012;7, e42931.View ArticlePubMedPubMed CentralGoogle Scholar
- Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–75.View ArticlePubMedGoogle Scholar
- Iwata H, Gotoh O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 2012;40, e161.View ArticlePubMedPubMed CentralGoogle Scholar
- Smit AFA HR& GP: RepeatMasker Open-4.0. <http://www.repeatmasker.org> 2013.
- Barghini E, Natali L, Cossu RM, Giordani T, Pindo M, Cattonaro F, Scalabrin S, Velasco R, Morgante M, Cavallini A. The Peculiar Landscape of Repetitive Sequences in the Olive (Olea europaea L.) Genome. Genome Biol Evol. 2014;6:776–91.View ArticlePubMedPubMed CentralGoogle Scholar
- Parra G, Blanco E, Guigó R. GeneID in Drosophila. Genome Res. 2000;10:511–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–9.View ArticlePubMedGoogle Scholar
- Borodovsky M, Lomsadze A: Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics 2011, Chapter 4:Unit 4.6.1–10.Google Scholar
- Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:R7.View ArticlePubMedPubMed CentralGoogle Scholar
- Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.View ArticlePubMedPubMed CentralGoogle Scholar
- Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–40.View ArticlePubMedPubMed CentralGoogle Scholar
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.View ArticlePubMedPubMed CentralGoogle Scholar
- Fishman L, Willis JH, Wu CA, Lee Y-W. Comparative linkage maps suggest that fission, not polyploidy, underlies near-doubling of chromosome number within monkeyflowers (Mimulus; Phrymaceae). Heredity (Edinb). 2014;112:562–8.View ArticleGoogle Scholar
- Zhang H, Miao H, Wang L, Qu L, Liu H, Wang Q, Yue M. Genome sequencing of the important oilseed crop Sesamum indicum L. Genome Biol. 2013;14:401.PubMedPubMed CentralGoogle Scholar
- Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 2015;43(Database issue):D130–7.View ArticlePubMedGoogle Scholar
- Schattner P, Brooks AN, Lowe TM: The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 2005, 33(Web Server issue):W686–9.Google Scholar
- Cruz F, Julca I, Gómez-Garrido J, Loska D, Marcet-Houben M, Cano E, Galán B, Frias L, Ribeca P, Gut M, Sánchez-Fernández M, García JL, Gut IG, Vargas P, Alioto TS, Gabaldón T. Genomics data from the Mediterranean olive tree. GigaScience Database: Olea europaea var. europaea; 2016. http://dx.doi.org/10.5524/100201.Google Scholar
- Olive genome and annotation files. http://denovo.cnag.cat/genomes/olive/.