Draft genome of the Chinese mitten crab, Eriocheir sinensis
- Linsheng Song†1, 2,
- Chao Bian†3,
- Yongju Luo†4,
- Lingling Wang†5,
- Xinxin You3,
- Jia Li3,
- Ying Qiu3,
- Xingyu Ma6,
- Zhifei Zhu6,
- Liang Ma7,
- Zhaogen Wang7,
- Ying Lei7,
- Jun Qiang1,
- Hongxia Li1,
- Juhua Yu1,
- Alex Wong8,
- Junmin Xu3, 6Email author,
- Qiong Shi3, 6Email author and
- Pao Xu1Email author
© Song et al. 2016
Received: 10 December 2015
Accepted: 12 January 2016
Published: 28 January 2016
The Chinese mitten crab, Eriocheir sinensis, is one of the most studied and economically important crustaceans in China. Its transition from a swimming to a crawling method of movement during early development, anadromous migration during growth, and catadromous migration during breeding have been attractive features for research. However, knowledge of the underlying molecular mechanisms that regulate these processes is still very limited.
A total of 258.8 gigabases (Gb) of raw reads from whole-genome sequencing of the crab were generated by the Illumina HiSeq2000 platform. The final genome assembly (1.12 Gb), about 67.5 % of the estimated genome size (1.66 Gb), is composed of 17,553 scaffolds (>2 kb) with an N50 of 224 kb. We identified 14,436 genes using AUGUSTUS, of which 7,549 were shown to have significant supporting evidence using the GLEAN pipeline. This gene number is much greater than that of the horseshoe crab, and the annotation completeness, as evaluated by CEGMA, reached 66.9 %.
We report the first genome sequencing, assembly, and annotation of the Chinese mitten crab. The assembled draft genome will provide a valuable resource for the study of essential developmental processes and genetic determination of important traits of the Chinese mitten crab, and also for investigating crustacean evolution.
KeywordsCrab genome Genomics Assembly Annotation
Genomic DNA was extracted from muscle tissue of a single female crab (Eriocheir sinensis; NCBI Taxonomy ID: 95602) after 3 generations of inbreeding that was obtained from a local farm in Panjin, Liaoning Province, China. We used the whole-genome shotgun sequencing strategy and constructed the subsequent short-insert libraries (170, 250, 500 and 800 bp) and long-insert libraries (2, 5, and 10 kb) using the standard protocol provided by Illumina (San Diego, USA). Paired-end sequencing was performed by the Illumina HiSeq 2000 system. In total, we generated 258.8 Gb of raw reads from all constructed libraries.
For whole-genome assembly, we employed Platanus  with optimized parameters (−k 27, −m 200) to construct contigs and original scaffolds. All reads were mapped onto contigs for scaffold building by utilizing the paired-end information. This paired-end information was subsequently applied to link contigs into scaffolds using a stepwise approach. Some intra-scaffold gaps were filled by local software using read-pairs in which one end uniquely mapped to a contig and the other end was located within a gap. Final genome assembly of the Chinese mitten crab is 1.12 Gb in total length, which is about 67.5 % of the estimated genome size. The contig N50 size (i.e., 50 % of the genome is in fragments of this length or longer) is 6.02 kb, and the scaffold (>2 kb) N50 is 224 kb.
We constructed a de novo repeat library using RepeatModeller (Version 1.04, default parameter) and LTR_FINDER . To identify known and de novo transposable elements (TEs), we employed RepeatMasker (Version 3.2.9)  against the Repbase TE library  (Version 14.04) and the de novo repeat library. In addition, we used RepeatProteinMask (Version 3.2.2) implemented in RepeatMasker to detect the TE-relevant proteins. We also predicted tandem repeats utilizing Tandem Repeat Finder [6, 7] (Version 4.04) with parameters set as “Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPerid = 2000”. Finally, we confirmed that the repeat sequences occupy approximately 50.4 % of the crab genome. Among them, the long interspersed elements, occupying 19.0 % of the crab genome, are the most predominant type of repeat sequences.
Summary of genome annotations
Average transcript length (bp)
Average coding sequence length (bp)
Average exons per gene
Average exon length (bp)
Average intron length (bp)
In summary, we report the first genome sequencing, assembly, and annotation of the Chinese mitten crab. The draft genome will provide a valuable resource for studying essential developmental processes in the Chinese mitten crab, investigating crustacean evolution, and improving the molecular breeding of this economically important species.
Availability of supporting data
Supporting data are available in the GigaDB database , and the raw data were deposited in the PRJNA305216.
This work was supported by the China 863 Project (No. 2014AA093501), the Special Project on the Integration of Industry, Education and Research of Guangdong Province (No. 2013B090800017), and the Shenzhen Scientific R & D Grant (No. CXB201108250095A).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Li R, Fan W, Tian G, Zhu H, He L, Cai J, et al. The sequence and de novo assembly of the giant panda genome. Nature. 2010;463:311–7.PubMedPubMed CentralView ArticleGoogle Scholar
- Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–95.PubMedPubMed CentralView ArticleGoogle Scholar
- Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–8.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics/editoral board, Andreas D Baxevanis [et al.] 2004; Chapter 4:Unit 4 10.
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.PubMedView ArticleGoogle Scholar
- Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80.PubMedPubMed CentralView ArticleGoogle Scholar
- Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome Res. 2004;14:988–95.PubMedPubMed CentralView ArticleGoogle Scholar
- Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. Journal Mol Bio. 1997;268:78–94.View ArticleGoogle Scholar
- Huang S, Wang J, Yue W, Chen J, Gaughan S, Lu W. Transcriptomic variation of hepatopancreas reveals the energy metabolism and biological processes associated with molting in Chinese mitten crab, Eriocheir sinensis.
- Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–11.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotech. 2010;28:511–5.View ArticleGoogle Scholar
- Elsik CG, Mackey AJ, Reese JT, Milshina NV, Roos DS, Weinstock GM. Creating a honey bee consensus gene set. Genome Biol 2007;8:R13.
- Nossa CW, Havlak P, Yue JX, Lv J, Vincent KY, Brockmann HJ, et. al. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication. GigaScience 2014;3:9.
- Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 2007;23:1061-7.
- Song L, Bian C, Luo Y, Wang L, You X, Li J, Qiu Y, Ma X, Zhu Z, Ma L, Wang Z, Lei Y, Qiang J, Li H, Yu J, Wong A, Xu J, Shi Q, Xu P. Supporting data for the “Draft genome of the Chinese mitten crab, Eriocheir sinensis”. GigaScience Database. 2016. http://dx.doi.org/10.5524/100186.