Single-cell sequencing analysis characterizes common and cell-lineage-specific mutations in a muscle-invasive bladder cancer
- Yingrui Li†1,
- Xun Xu†1,
- Luting Song†1, 2, 3, 4,
- Yong Hou†1, 5, 6,
- Zesong Li†7, 8, 9,
- Shirley Tsang10,
- Fuqiang Li1,
- Kate McGee Im11,
- Kui Wu1,
- Hanjie Wu1, 12,
- Xiaofei Ye1,
- Guibo Li1,
- Linlin Wang1,
- Bo Zhang1,
- Jie Liang1,
- Wei Xie1, 5, 6,
- Renhua Wu1,
- Hui Jiang1,
- Xiao Liu1,
- Chang Yu1,
- Hancheng Zheng1,
- Min Jian1,
- Liping Nie13,
- Lei Wan14,
- Min Shi13,
- Xiaojuan Sun7, 8, 9,
- Aifa Tang7, 8, 9,
- Guangwu Guo1,
- Yaoting Gui13,
- Zhiming Cai8, 9, 13,
- Jingxiang Li1,
- Wen Wang2,
- Zuhong Lu5, 6,
- Xiuqing Zhang1,
- Lars Bolund1, 15,
- Karsten Kristiansen1, 16,
- Jian Wang1,
- Huanming Yang1Email author,
- Michael Dean11Email author and
- Jun Wang1, 16, 17Email author
© Li et al.; licensee BioMed Central Ltd. 2012
Received: 9 April 2012
Accepted: 2 August 2012
Published: 14 August 2012
Cancers arise through an evolutionary process in which cell populations are subjected to selection; however, to date, the process of bladder cancer, which is one of the most common cancers in the world, remains unknown at a single-cell level.
We carried out single-cell exome sequencing of 66 individual tumor cells from a muscle-invasive bladder transitional cell carcinoma (TCC). Analyses of the somatic mutant allele frequency spectrum and clonal structure revealed that the tumor cells were derived from a single ancestral cell, but that subsequent evolution occurred, leading to two distinct tumor cell subpopulations. By analyzing recurrently mutant genes in an additional cohort of 99 TCC tumors, we identified genes that might play roles in the maintenance of the ancestral clone and in the muscle-invasive capability of subclones of this bladder cancer, respectively.
This work provides a new approach of investigating the genetic details of bladder tumoral changes at the single-cell level and a new method for assessing bladder cancer evolution at a cell-population level.
KeywordsSingle-cell exome sequencing Bladder cancer Tumor evolution Population genetics
Bladder cancer (BC) is among the top ten most common cancers in the world; and transitional cell carcinoma (TCC) is the most common form, presenting in 90% of diagnosed BC . Previous studies have indicated that the development of TCC involves multiple steps [2–4], but the key mutations and how this process occurs remain largely unknown. Clinical and genetic studies have classified TCC patients into two main categories: non-muscle-invasive TCCs (NMI-TCCs), which occur in approximately 70% of the patients and often carry mutations in the FGFR3 and RAS genes; and muscle-invasive TCCs (MI-TCCs), which occur in approximately 30% of the patients and often carry mutations in the TP53 and RB1 genes . MI-TCC, however, is the form that is associated with a higher mortality rate , which makes this form of BC, though less common, of greater concern for developing the means to assess and ultimately devising viable treatments.
Current information has indicated that there is a shared genetic pattern in TCCs among patient populations , but it has not yet been possible to apply this information to understand tumor formation within a patient. Moreover, the heterogeneous nature of the tumor and its contamination by infiltrating ‘normal’ cells further complicate cancer studies, since the functionally important mutations may only reside in a portion of the cells within a tumor sample and would be undetectable in heterogeneous tumor tissues. Given the heterogeneous nature of tumors both among patients and within tumors, understanding tumors at a cell-specific level may be a direct way for developing targeted ‘personalized’ therapies for bladder cancer.
It is now feasible to gain greater insight into cellular selection within the tumors given the technical development of large-scale data acquisition and genome analysis, including the emergence of new methods of genome sequencing for copy-number genetic analyses  and single nucleotide analyses at the single-cell level [8, 9]. However, there is currently no study that attempts to place the timing of key mutations within the development history of the tumor to infer their potential roles in tumorigenesis at the single-cell level, which is of great importance in developing effective cellular targeted therapies in personalized medicine.
Here we present results from single-cell exome sequencing (SCS) and analyses of a MI-TCC. The sequence data revealed the complexity of the genetic patterns within this tumor and identified the presence of genetically different tumor cell types within the tumor tissue. In addition, by placing the timing of key mutations within the development history of the tumor, we discovered candidate cancer-associated genes that might serve to drive not only the initiation of carcinogenesis, but also subsequent cell lineage development that may be involved in cancer progression.
We obtained samples of fresh tumor (standard surgery of bladder cancer: >80% tumor cells) and para-carcinoma tissue from a 57-year-old male with MI-TCC of the bladder classified as stage II (T2-N0M0) ( Additional file 1: Figure S1, see Methods for details).
We carried out single-cell exome sequencing on individual cells from these samples as described in . Briefly, we gently disrupted the tissue by collegenase I and IV, and randomly selected single cells from the tumor tissue and normal adjacent tissue. Exome capture was performed on the whole-genome amplification (WGA) products of each cell. The resulting libraries were then subjected to second-generation sequencing (see Methods for exome capture and sequencing details). To drastically reduce errors in the subsequent analyses, cells were discarded if they had <70% coverage of the exome targets or a significant false heterozygous rate across the X chromosome due to amplification and/or hybridization failures. With a total of 66 cells sequenced, 44 single cells from the tumor tissue (hereafter referred to as BC cells) and 11 from the normal adjacent tissue (hereafter referred to as BN cells) were qualified and selected for subsequent analyses ( Additional file 2: Table S1). The average sequencing depth in exome regions of the qualified single cells was 40-fold, compiling a comprehensive dataset of approximately 2,200-fold coverage from all cells, which enabled the genotype calling for the majority of sites in the exome regions . We achieved an average of 88.6% whole-exome coverage of all qualified single cells ( Additional file 2: Table S1) and covered more than 60% of the target region greater than 5× sequencing depth in all cells ( Additional file 3: Figure S2C-D).
In addition to single-cell exome sequencing, we also sequenced the whole exome of bulk DNA from the same bladder cancer tissue with 137× coverage and the normal bladder tissue with 28× coverage to use as a control for evaluating the data quality of our SCS ( Additional file 3: Figure S2A-BAdditional file 2: Table S1). Data described here is available in the NCBI Short Read Archive [SRA051489] and GigaScience.
Variation calling and quality assessment
We carried out sequencing data evaluation, somatic mutation calling, and additional bioinformatics analysis as described by a pipeline described in [8, 9], with minor modifications as detailed in Additional file 4: Figure S3.
Using these data, we first determined the percentage of alleles that were missing due to errors introduced by WGA and exome capture in each single BN cell, and found that approximately 40% of the heterozygotes had one allele dropout (ADO) ( Additional file 5: Table S2). This performance was comparable to previous work [8, 9, 12]. Additionally, we took the ADO rate into account for all subsequent analyses to ensure the accuracy of our findings.
We also examined the percentage of alleles in the homozygous samples that were false-discovery events due to errors or artifacts during SCS. The false discovery rate (FDR) was very low, at 6.7 × 10-5, which was comparable to that of conventional tissue sequencing using the same sequencing platform in a previous report . We further assessed FDR by examining the sequence generated from the mitochondrial DNA. While not specifically targeted on the exome-capture array, sufficient mitochondrial DNA became captured to cover the 16,561 bp genome by 10-100× or more. The aligned sequence reads of mitochondrial DNA for each single normal cell were analyzed by sequence analysis/map (SAM) tools  to generate predicted variants. For 11 single normal cells, the predicted variants were manually inspected, and sites with fewer than five mutant reads were discarded. This resulted in seven mutations identified, with no two cells having the same mutation. Assuming all were false positive, this gave an FDR of 2.6 × 10-5. In total, the false discovery analyses provided very strong evidence that when multiple single cells had the same mutation, those variants were true somatic mutations.
We then ascertained somatic mutations in the tumor cells from SCS as noted in [8, 9]. A somatic mutation was defined as a site that was consistently called homozygous in all BN cells (with a minimum of six read-coverage BN cells), but had mutant in less than three BC cells. We set this specific threshold for cell numbers to eliminate errors and artifacts from the SCS process: the required number of BN cell was based on a binomial test using our ADO ( Additional file 6: Figure S4BAdditional file 5: Table S2), qualified BN cell number, and whole-exome size; the required number of BC cells was determined based on a binomial test with the false discovery ( Additional file 6: Figure S4A), the qualified BC cell number, and the whole-exome size.
Summary of somatic mutations called in single-cell exome sequencing
Mutant alleles observed in matched tissue sequencing
Total N (%)
3′UTR N (%)
5′UTR N (%)
Intron N (%)
Intergenic N (%)
NS N (%)
S N (%)
In addition to the evaluation of the specificity for mutation calling, we also took advantage of SCS to estimate the frequency of mutant alleles in the cell populations within the MI-TCC. The cell mutant allele frequency from multiple single-BC cells was highly correlative (R2 >0.89) with the mutant-read frequency in all read-covered mutant sites in BC tissue ( Additional file 6: Figure S4C). This result indicated that the frequency estimation from single cells was accurate and the genetic analyses of these single cells can, on behalf of this MI-TCC, enable us to apply population genetics analysis to this cell population.
Of additional note, when we used conventional exome sequencing of the DNA from the whole tissue samples and then compared the consensus sequences from the BC and BN tissues, though with very high sequencing depth (137-fold) for BC tissue, we could only identify 134 (30.25% of 443) high-confidence mutations found in our single-cell analyses, which indicated that single-cell analyses had a much higher sensitivity for identifying rare mutations present in tumor tissues.
Inference of tumor development by population genetics methods
The comprehensive data set and accurate estimation of allele frequency, that we obtained for the tumor and normal cell populations, provided us an unprecedented opportunity to apply population genetics analysis methods to decipher the genetic pattern of this bladder cancer development.
In addition, other somatic mutations outside the 50%-frequency peak displayed a hyperbolic decay in counts with mutant allele frequency increases, which is a characteristic of cell population expansion with somatic mutation accumulations under a neutral model— mathematically similar to a population genetics model proposed at the individual human level . This suggested that the ancestral tumor cell acquired a growth advantage over the other cells and expanded from this event forward.
To assess whether the mutations under selection during tumorigenesis occured by tumor SMAFS, we found the nonsynonymous mutations had a frequency distribution of shift to the high frequency (right in the figure), in other words, they had a significant excess (p-value = 0.02, Fisher’s test) of higher frequency (> 0.1) mutations compared to the identified synonymous mutations (Figure 1B). This shift was also seen after excluding the 50%-frequency spectrum peak (40%-60%). Given that the synonymous mutations were under neutral selection and thus not conferring functional impacts, the above observation gave a signature that the somatic mutations were generally under positive selection in this MI-TCC. This suggested that the common ancestral tumor cell acquired characteristics of the tumor and had a continually proliferative advantage over the normal cells during the tumor development process. This constant selective pressure would provide a similar means for specific subclones of the tumor cells to gain additional mutations which provided a higher growth or survival advantage over other tumor cells. These beneficial growth conferring mutations have been defined as driver mutations , even though they may not be the triggering events, but instead were acquired during tumor progression.
We next subjected the identified mutations to a principal component analysis (PCA) to characterize the genetic heterogeneity of tumor and normal cells ( Additional file 8: Figure S6). Here, the first vector clearly differentiated the tumor cells from normal cells into two distinct clusters, which indicated there was no contamination of normal cells during tumor cell selection from the dissected tumor tissue, or vice versa. The eigenvector positions of all the normal cells and normal tissue showed they were nearly identical, also indicating no contamination in normal tissue samples. In contrast, the tumor cells showed considerable diversity across the all principal vectors (first to fourth vectors), which indicated this MI-TCC was heterogeneous and also supported our conclusions that the tumor cell population expanded during positive selection on these specific mutant cells.
Identifying key mutant genes in TCC
In addition, we also found two emerging subclones (Clones B and C) that separately obtained additional clone-specific mutant genes (7 and 12, respectively for Clones B and C in Figure 2A) that occurred subsequently to initiating events in tumor progression. Although numerous evaluations with varying mutation rates in tumor tissues and normal tissues [18–20], we conservatively assumed a mutation rate of 5 × 10-9 per base pair per cell generation, given that cancer tissues did not have a lower mutation rate than adjacent normal tissues. Thus, the divergence time between clones should be less than 40 generations. Hence, the two emerging subclones appeared to have originated late in the tumor history, and should only make up a small proportion of all the cells by random. However, Clones B and C each represent approximately 35% of the tumors’ cells, which was significantly larger than expected. This suggested the two subclones were undergoing positive selection over original Clone A, revealing the subclone-specific genes were not all passenger mutations, but instead included mutations that conferred additional growth or survival advantages. Thus, subclone-specific genes were likely to be equally important in understanding the complex makeup of this tumor.
Next, we surveyed the identified commonly mutant genes in an additional cohort of 99 TCC patients by the conventional exome sequencing  ( Additional file 9: Table S3), to determine whether these mutant genes were all unique to this patient or exist recurrently as mutant genes which would more likely to be candidate cancer genes. We found four genes, out of the 22 commonly mutant genes (Group I), also had non-silent mutations in at least three other TCC patients, including CFTR (mutant in seven patients), NIPBL (mutant in five patients), ASTN1 (mutant in four patients) and DHX57 (mutant in three patients), which were all novel findings of this study. These mutant genes in the derived ancestral tumor cell may have complementary rather than overlapping functions, and then cooperate to confer an overall growth or survival advantage. Of interest, the known functions of these recurrently mutant genes were diverse. For example, NIPBL was a cohesion complex regulator, playing a role in recruiting histone deacetylases to chromatin, and was the cause of most cases of Cornelia de Lange Syndrome that can lead to severe developmental anomalies , and was also identified in ovarian and other tumors . Mutations in these chromosome-remodeling proteins may lead to epigenetic changes followed by deregulated gene expression and consequently, tumorigenesis . In addition, CFTR was a chloride channel and the cause of cystic fibrosis ; DHX57 was a putative ATP-dependent RNA helicase truncated in this TCC tumor ; and ASTN1 was a neuronal adhesion molecule shown to be mutant in previous tumor studies [26, 27].
We also identified three recurrently mutant genes, which were all also first found in this study, among the subclone-specific genes that may serve as potential driver genes unique to clones B and C. Among Clone B, ATM was mutant and found recurrently altered in five other TCC patients. ATM was a known tumor suppressor that played a key role as a cell-cycle checkpoint kinase in response to DNA damage [28, 29]. From Clone C-specific mutant genes, COL6A3 and KIAA1958 each recurred in four additional patients. COL6A3 encoded a collagen protein reported to have significant changes in expression level in certain tumor tissues  and was a putative pancreas cancer biomarker . The role of KIAA1958, however, remained uncharacterized, but its potential role made it an interesting candidate for future analyses in tumors.
To further assess the likelihood of these seven recurrent genes being important to TCC development of this patient, we also scored the genes with Q-score by cancer driver prediction [8, 32] that believed driver genes were likely to contain significantly more nonsynonymous mutations than background mutations. We observed that four genes (CFTR ASTN1 DHX57 and KIAA1958) had a Q-score higher than or close to 1 (10% false discovery rate). Although not all Q-score genes were higher than 1 ( Additional file 10: Figure S7), we still believed all these recurrent genes were likely to be of great importance in TCC, given that the key importance of a driver gene depends on its functional impact more than just accumulation of more and more nonsynonymous mutations.
Here we carried out deep exome sequencing of individual cells from both tumor and adjacent normal tissues of an MI-TCC patient. Overall, the genetic profile of gene mutations in this MI-TCC indicated that the genesis of this tumor was multi-factorial, though it did not include the typical genetic signatures found in many other TCC tumors. The data from the single-cell exomes allowed us to successfully apply a population genetics analysis using the individual cells and provided evidence that this MI-TCC had a monoclonal origin. With comparison of SMAFS analysis between tumor and normal cells, we discovered the tumor cells suffered constant positive selection and accumulated driver gene mutations during both the initiation and progression processes. We had successfully deciphered the subclonal structures and cell-lineage trees of this complex tumor tissue, and identified several putative driver gene candidates using additional tumor samples from a 99-patient cohort. Anchoring these mutant genes to cell-lineage tree branches revealed mutually exclusive subclone-specific driver gene candidates, which may provide an insight into how a tumor evolves into the difficult to treat metastasis.
Given the importance for understanding tumor development mechanisms for designing workable cancer treatments, several models have been proposed to characterize this complex process. These include the clonal evolution model , the mutator phenotype model , and the stochastic progression model . With the data from our single-cell exomes, we reconstructed the developmental history of this tumor and its subclones (Figure 3), providing new information relevant to these models. Our data indicated this tumor had a monoclonal origin from an ancestral cell with multiple mutant driver gene candidates that could generate the hallmarks of cancer , which fit best with the clonal evolution model. The only ambiguity in our defined tumor development process, however, was whether the inferred common ancestral cell represented the initial tumor cell, or whether it emerged after an earlier tumor initiation event and developed growth/survival advantages over other early tumor cells via clonal selection. Further studies on early-stage tumor tissues would serve to distinguish between the two scenarios. Of additional interest, we found certain mutations that function in genome stability and DNA repair in this tumor, suggesting this MI-TCC would also be compatible with the mutator phenotype model, as several driver genes underlying tumor initiation followed by clone-specific expansion. The fact that our data fit both the cancer evolution and mutator phenotype model suggests that a single model may not serve as a suitable model for all types, or even one type, of cancer.
In this study we carried out genetic analyses of individual cells collected from a whole tumor, rather than from specific tumor coordinates. Thus a clear next step, given distinct genetic differences within the tumor , is to obtain a set of single cells from different tumor quadrants and carry out similar cell population analyses – also beneficial for designing more effective drugs and efficient treatments. Defining the genetic changes and the evolution of such changes within these different components of a tumor may indicate the presence of different types of selective pressures, or of specific genetic mutations that are important to areas of tumors that are interacting with separate types of cellular microenvironments.
Case report and sample collection
The patient was a 57-year-old male, diagnosed with primary muscle-invasive transitional cell carcinoma (MI-TCC) of the bladder according to the 2004 World Health Organization (WHO)/ International Society of Urological Pathology (ISUP) grading systems. The details of diagnosis were as follows ( Additional file 1: Figures S1 and Additional file 3: Figure S2): a 7 × 4 × 4 cm papillary tumor in the trigone of the urinary bladder, invaded the bladder wall – the cut surface was grey-white and brittle, and classified as stage II (T2-N0M0) under microscopic examination. Tumor invasion into shallow muscularis of bladder wall was observed, but it did not penetrate deep into the muscularis and serous layer. Furthermore the tumor did not invade the prostate tissue or cause a bilateral ureteral obstruction, and there was no metastasis in six pelvic lymph nodes. The hematoxylin eosin-stained sections prepared using the cancerous tissues were microscopically evaluated by two independent pathologists ( Additional file 3: Figure S2). Afterward the primary tumor sample and matched para-carcinoma tissue were obtained from a surgery-resected sample from this MI-TCC patient at the Peking University Shenzhen Hospital. Informed written consent was obtained from the study participant. The studies were conducted in accordance with the Declaration of Helsinki II and were approved by the local Ethical Committees.
Single-cell isolation and WGA
Every step during the experiment was reduced to a strict minimum. With sufficient dispersion and cascade dilution of cells, single cells were randomly isolated from collagenase I and IV digested tumor or para-carcinoma tissues into PCR-ready tubes using an inverted microscope and a self-made mouth-controlled, fine hand-drawn micro-capillary pipetting system. The single cell isolation was visually confirmed by photograph under microscope. Afterward, WGA of single-cell DNA were performed using the REPLI-g® Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer’s instruction, using a no cell reaction as a negative control, and a reaction of human tissue genomic DNA as positive control.
Quantitation and genome-integrity assessment of the WGA products
The DNA concentration of the WGA products were measured with highly sensitive fluorescence-based Quant-iTTM assays using the QubitTM Quantitation Platform (Life Technologies, Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instruction. The genomic integrity of the qualified products (> 60 ng/μl) were then assayed with their amplification represented by 10 housekeeping-gene-PCR tests, and the 10 genes (PRDX6, RPL37a, ADD1, ARHGEF7, EIF2B2, PSMD7, PSMB6, MC2R, BCAT2, and ATP5O) were interspersed across 10 different chromosomes. Afterward, WGA products of best performance from both housekeeping PCR (≥8/10) and Qubit assays (> 60 ng/μl) were selected for downstream experiments. Every step was performed along with a sample of genomic DNA from human tissue as a positive control and a no template reaction as a negative control, respectively.
Whole exome capture and sequencing of samples
High molecular weight genomic DNA was extracted from the primary tumor and matched para-carcinoma tissue, respectively. And for both DNA samples (single cells and matched tissues), whole exome capture was accomplished based on liquid phase hybridization of sonicated 2 μg genomic DNA to the bait cRNA library synthesized on magnetic beads using the SureSelect® Human All Exon 50 Mb kit (Agilent Technologies, Santa Clara, CA, USA) according to the manufacturer’s protocol. The captured targets were subjected to massive sequencing using the Illumina HiSeq2000 with the paired-end 100 bp read option, according to the manufacturer’s instruction.
Public dataset access
Human (Homo sapiens) reference genome sequence (Hg18) and its annotation files (e.g., dbSNP v128) were downloaded from the University of California Santa Cruz Genome Bioinformatics site . The targeted region files of exome capture were downloaded from the Agilent Technologies website .
Reference-guide genome assembly
SOAPaligner/SOAP2 version 2.20 was used to align all short sequencing reads to the Hg18 reference genome with a maximum of two mismatches and non gap parameters. The insert size distribution of each library was checked by Eland contained in the Illumina Pipeline, and thus the insert size range was set for SOAPaligner.
All reads uniquely mapped to exome region and 100 bp-flanking regions were selected for SNP calling. SOAPsnp version 1.03 was used for calculating the likelihood of each cell genotype and mixed tissue. In each sample, putative SNPs were filtered based on the following criteria: (1) a Q20 quality cutoff; (2) at least five reads; (3) a p-value >0.01 (that means no significant difference between the sequencing quality of the two alleles in a heterozygous genotype); (4) a 5 bp distance from each other; and (5) 1/3 ~ 3 variation between quality score of two bases in heterozygous sites. And the final SNP data of the filtered SNP in each cell was combined.
Evaluation of ADO rate
With the final SNPs, the ADO ratio is defined as the random non-amplification of one of the alleles present in a heterozygous sample.
n the sequencing depth
T the tissue sequencing
S the single cell sequencing
HR the heterozygous rate of one kind of sequencing under n × sequencing depth
CR the read-covered rate of one kind of sequencing under n × sequencing depth
We thus calculated the ADO of a single normal cell, which was the median of all relative FNR per depth on non-outlier depths (the outlier depth was the depth when the FNR in tissue sequencing at this depth was higher than the SCS). And finally we calculated the ADO in all whole-exome SCS as the mean of ADO of all normal cells.
Evaluation of FDR
The FDR is defined as a false heterozygous site in a homozygous sample, which may be due to amplification, hybridization or sequencing errors. We chose a homozygous subset which was high-confidence (Quality-score = 99 and 95% confidence interval of distributive depth under Poisson distribution), and then we calculated the number of discrepant sites of single qualified normal cell in the homozygous subset. The FDR per cell was indicated as the number of discrepant sites divided by total covered number of high-confidence homozygous subset. And finally the FDR in all whole-exome SCS was indicated as the mean of all qualified normal cells.
Identification and experimental validation of somatic mutations
To eliminate the random errors or artifacts induced by SCS, we built two binomial tests to detect high-confidence point somatic mutations (SMs) in capture regions. The putative SMs were filtered based on the following criteria: (1) homozygous normal in all normal cells (at least read-covered in six normal cells); (2) the homozygous genotypes in normal cells were consistent in normal tissue; (3) at least mutant in three cancer cells among a total of 44 qualified cancer cells.
i the normal cell number of read-covered for a specific mutation
n the total number of qualified normal cells
p a the ADO
P(i) the probability under binomial distribution
S the whole-exome size
Then the criterion (1) was set according to the largest cell number i fulfills above inequation.
i the cancer cell number of mutant of a specific mutation
n the total number of qualified cancer cell
p f the FDR
P(i) the probability under binomial distribution
S the whole-exome size
Then the criterion (3) was set according to the largest cell number fulfills above inequation.
High confident SMs in BC cells were randomly selected for experimental validation. The loci of corresponding single cells were amplified, sequenced and analyzed by MassARRAY® Analyzer (SEQUENOM, San Diego, CA, USA).
Correlation of SMAF between SCS and whole tissue sequencing
Correlation coefficient of determination is a goodness-of-fit measure for models based on the proportion of explained variance. The somatic mutant allele frequency in SCS was indicated as unfolded site frequency of mutant alleles, and the somatic mutant allele frequency in tissue sequencing was indicated as read frequency of mutant alleles.
Cell population genetic inferences based on called (inferred) SNPs can lead to serious biases and possibly false inferences for the high ADO ratio for each cell. We have therefore developed a series of statistical techniques that can take uncertainty in genotype calls and allele frequency estimation into account. Instead of utilizing a Bayesian estimation-based method called site-frequency spectrum (SFS) on population individuals , we used this SFS to calculate the somatic allele frequency in each SM site of all BC and BN cells respectively, based on the same methods of estimating allele frequencies from reads in one site and additional estimating sample allele frequencies.
To identify the most variable factors in classifying subgroups among single cancer cells, we utilized an R package, pcaMethods v1.12.0 , in performing the PCA based on the genotyping result at all somatic mutation sites on each single cell. Missing values were automatically estimated by probabilistic method within the R package.
Heat map 2-dimension clustering
The 2-dimension heat map clustering of mutant genes and cells is based on the somatic mutations using R language. Color indicates the presence of tendency of mutant or normal, which was showed as log10 value of the relative probability of mutant to the probability of normal (non-mutant) after Bayes calibration by FDR, ADO and a priori probability from normal tissue.
A i is one kind of the three genotypes (NN, NC, CC for homozygous normal, heterozygous mutant, homozygous mutant, respectively) in the site of each cell.
O i is one observed genotype (NN or NC or CC).
is the mutant posterior probability of A i on the observation of O i .
Pr (Ai) is the priori probability of A i calculated from the corresponding site at normal tissue.
is the probability of A i calibrated by ADO and FDR.
H is the logarithmic value of the relative probability of mutant to the probability of normal used for heat map 2-dimension clustering.
Note: Quality score and rank sum test p-value are the values from the files after sequencing read alignment by SOAP.
f homo is the ADO in homozygous genotype.
f heter is the ADO in heterozygous genotype.
f p is the FDR in single cell sequencing.
If a site is not covered by reads in one cell, then we estimate its log10 value by averaging all the log10 values in other cells at this site.
The columns and the rows were also ordered as in the tree with sub clusters. And certain suspicious genes illustrated as red (likely mutant) in BN cells, which were defined as ≥1 BN cell containing value ≥0.5 (light red), in our raw data were filtered in heat map clustering. Having been tackled with FDR, ADO and other sources of error, we suggested that the remaining suspicious genes may due to paralogous alignment or false negatives under our model. Under extreme low FDR value, the false negatives here might derive from super high sensitivity of our heat map model to even a low number of error sequencing reads in a genotype, that is, if there exist two mutant reads in a total of 30 sequencing reads in BN cells, its adjusted heat map-array value was likely to higher than 1 (red) in our model. However, two error sequencing reads out of 30 reads is understandable in a current second-generation sequencing platform. Finally, the remaining 51 genes (yellow-colored genes in Additional file 5: Table S2B) were used for the heat map and its matched cell-lineage tree analyses.
Concurrent and exclusive mutation analysis
Function analysis of key genes
To analyze the functional impact of the somatic mutations in key genes, we first investigated the protein sequence of each key gene from the UniProtKB/Swiss-Prot database . The amino acid changes were determined by the human genome annotation files from the University of California Santa Cruz Genome Bioinformatics site . We then mapped the abnormal amino acid changes to the protein sequence of each gene and determined which domain had been altered. Further, altered signaling pathways were determined by mapping the key genes to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database .
Availability of supporting data
The raw sequence data in the fastq format from this study were deposited in the NCBI Short Read Archive under the accession number SRA051489 and alignments and genotyping data is available from GigaScience.
- BC cells:
Single cells from tumor tissue
- BN cells:
Single cells from the normal adjacent tissue
False discovery rate
Principal component analysis
Somatic mutant allele frequency spectrum
Bladder transitional cell carcinoma.
This work was supported by a National Basic Research Program of China (973 program no.2011CB809202;2011CB809203), the Chinese 863 program (no.2009AA022707), the Shenzhen Municipal Government of China (grants ZYC201005250020A ), the Key Laboratory Project Supported by Shenzhen City (grants CX B200903 110066A; CXB201108250096A), and Shenzhen Key Laboratory of Gene Bank for National Life Science. This project was also supported by a grant from the Innovative Research Team Project of GuangDong and the Guangdong Enterprise Key Laboratory of Human Disease Genomics, and in part by the Intramural Research Program of the National Institutes of Health, National Cancer Institute, Center for Cancer Research. We also thank the Danish Natural Science Research Council for the Ole Rømer grant and the Shenzhen Municipal Government and the Local Government of Yantian District of Shenzhen for providing funding and an in part by the Intramural Research Program of the National Cancer Institute, National Institutes of Health, USA.
Correspondence and requests for materials should be addressed to: firstname.lastname@example.org (JW), email@example.com (MD), firstname.lastname@example.org (HY).
- Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D: Global cancer statistics. CA Cancer J Clin. 2011, 61 (2): 69-90. 10.3322/caac.20107.View ArticlePubMedGoogle Scholar
- Knowles MA: Molecular pathogenesis of bladder cancer. Int J Clin Oncol. 2008, 13 (4): 287-297. 10.1007/s10147-008-0812-0.View ArticlePubMedGoogle Scholar
- Mitra AP, Cote RJ: Molecular pathogenesis and diagnostics of bladder cancer. Annu Rev Pathol. 2009, 4: 251-285. 10.1146/annurev.pathol.4.110807.092230.View ArticlePubMedGoogle Scholar
- Kompier LC, van Tilborg AA, Zwarthoff EC: Bladder cancer: novel molecular characteristics, diagnostic, and therapeutic implications. Urol Oncol. 2010, 28 (1): 91-96. 10.1016/j.urolonc.2009.06.007.View ArticlePubMedGoogle Scholar
- Witjes JA, Hendricksen K: Intravesical pharmacotherapy for non-muscle-invasive bladder cancer: a critical analysis of currently available drugs, treatment schedules, and long-term results. Eur Urol. 2008, 53 (1): 45-52. 10.1016/j.eururo.2007.08.015.View ArticlePubMedGoogle Scholar
- Gui Y, Guo G, Huang Y, Hu X, Tang A, Gao S, Wu R, Chen C, Li X, Zhou L, He M, Li Z, Sun X, Jia W, Chen J, Yang S, Zhou F, Zhao X, Wan S, Ye R, Liang C, Liu Z, Huang P, Liu C, Jiang H, Wang Y, Zheng H, Sun L, Liu X, Jiang Z: Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet. 2011, 43 (9): 875-878. 10.1038/ng.907.View ArticlePubMedGoogle Scholar
- Navin N, Kendall J, Troge J, Andrews P, Rodgers L, McIndoo J, Cook K, Stepansky A, Levy D, Esposito D, Muthuswamy L, Krasnitz A, McCombie WR, Hicks J, Wigler M: Tumour evolution inferred by single-cell sequencing. Nature. 2011, 472 (7341): 90-94. 10.1038/nature09807.View ArticlePubMedPubMed CentralGoogle Scholar
- Hou Y, Song L, Zhu P, Zhang B, Tao Y, Xu X, Li F, Wu K, Liang J, Shao D, Wu H, Ye X, Ye C, Wu R, Jian M, Chen Y, Xie W, Zhang R, Chen L, Liu X, Yao X, Zheng H, Yu C, Li Q, Gong Z, Mao M, Yang X, Yang L, Li J, Wang W: Single-Cell Exome Sequencing and Monoclonal Evolution of a JAK2-Negative Myeloproliferative Neoplasm. Cell. 2012, 148 (5): 873-885. 10.1016/j.cell.2012.02.028.View ArticlePubMedGoogle Scholar
- Xu X, Hou Y, Yin X, Bao L, Tang A, Song L, Li F, Tsang S, Wu K, Wu H, He W, Zeng L, Xing M, Wu R, Jiang H, Liu X, Cao D, Guo G, Hu X, Gui Y, Li Z, Xie W, Sun X, Shi M, Cai Z, Wang B, Zhong M, Li J, Lu Z, Gu N: Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. Cell. 2012, 148 (5): 886-895. 10.1016/j.cell.2012.02.025.View ArticlePubMedGoogle Scholar
- Yi X, Liang Y, Huerta-Sanchez E, Jin X, Cuo ZX, Pool JE, Xu X, Jiang H, Vinckenbosch N, Korneliussen TS, Zheng H, Liu T, He W, Li K, Luo R, Nie X, Wu H, Zhao M, Cao H, Zou J, Shan Y, Li S, Yang Q, Asan , Ni P, Tian G, Xu J, Liu X, Jiang T, Wu R: Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010, 329 (5987): 75-78. 10.1126/science.1190371.View ArticlePubMedPubMed CentralGoogle Scholar
- Li Y, Xu X, Song L, Hou Y, Li F, Wu K, Wu H, Liang J, Jian M, Li J, Zhang X, Wang J, Yang H, Wang J: Single cell whole-exome sequences of bladder cancer from an individual. GigaScience. 2012,http://dx.doi.org/10.5524/100037,Google Scholar
- Spits C, Le Caignec C, De Rycke M, Van Haute L, Van Steirteghem A, Liebaers I, Sermon K: Whole-genome multiple displacement amplification from single cells. Nat Protoc. 2006, 1 (4): 1965-1970. 10.1038/nprot.2006.326.View ArticlePubMedGoogle Scholar
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038/nature07517.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.View ArticlePubMedPubMed CentralGoogle Scholar
- Sathirapongsasuti JF, Lee H, Horst BA, Brunner G, Cochran AJ, Binder S, Quackenbush J, Nelson SF: Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011, 27 (19): 2648-2654. 10.1093/bioinformatics/btr462.View ArticlePubMedPubMed CentralGoogle Scholar
- The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073. 10.1038/nature09534.View ArticlePubMed CentralGoogle Scholar
- Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature. 2009, 458 (7239): 719-724. 10.1038/nature07943.View ArticlePubMedPubMed CentralGoogle Scholar
- Loeb LA, Loeb KR, Anderson JP: Multiple mutations and cancer. Proc Natl Acad Sci U S A. 2003, 100 (3): 776-781. 10.1073/pnas.0334858100.View ArticlePubMedPubMed CentralGoogle Scholar
- Loeb LA, Bielas JH, Beckman RA: Cancers exhibit a mutator phenotype: clinical implications. Cancer Res. 2008, 68 (10): 3551-3557. 10.1158/0008-5472.CAN-07-5835. discussion 3557View ArticlePubMedGoogle Scholar
- Yachida S, Jones S, Bozic I, Antal T, Leary R, Fu B, Kamiyama M, Hruban RH, Eshleman JR, Nowak MA, Velculescu VE, Kinzler KW, Vogelstein B, Iacobuzio-Donahue CA: Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature. 2010, 467 (7319): 1114-1117. 10.1038/nature09515.View ArticlePubMedPubMed CentralGoogle Scholar
- Tonkin ET, Wang TJ, Lisgo S, Bamshad MJ, Strachan T: NIPBL, encoding a homolog of fungal Scc2-type sister chromatid cohesion proteins and fly Nipped-B, is mutated in Cornelia de Lange syndrome. Nat Genet. 2004, 36 (6): 636-641. 10.1038/ng1363.View ArticlePubMedGoogle Scholar
- Jahnke P, Xu W, Wulling M, Albrecht M, Gabriel H, Gillessen-Kaesbach G, Kaiser FJ: The Cohesin loading factor NIPBL recruits histone deacetylases to mediate local chromatin modifications. Nucleic Acids Res. 2008, 36 (20): 6450-6458. 10.1093/nar/gkn688.View ArticlePubMedPubMed CentralGoogle Scholar
- Chi P, Allis CD, Wang GG: Covalent histone modifications–miswritten, misinterpreted and mis-erased in human cancers. Nat Rev Cancer. 2010, 10 (7): 457-469. 10.1038/nrc2876.View ArticlePubMedPubMed CentralGoogle Scholar
- Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, Rozmahel R, Cole JL, Kennedy D, Hidaka N: Identification of the cystic fibrosis gene: chromosome walking and jumping. Science. 1989, 245 (4922): 1059-1065. 10.1126/science.2772657.View ArticlePubMedGoogle Scholar
- Liang J, Song W, Tromp G, Kolattukudy PE, Fu M: Genome-wide survey and expression profiling of CCCH-zinc finger family reveals a functional module in macrophage activation. PLoS One. 2008, 3 (8): e2880-10.1371/journal.pone.0002880.View ArticlePubMedPubMed CentralGoogle Scholar
- Thouennon E, Elkahloun AG, Guillemot J, Gimenez-Roqueplo AP, Bertherat J, Pierre A, Ghzili H, Grumolato L, Muresan M, Klein M, Lefebvre H, Ouafik L, Vaudry H, Plouin PF, Yon L, Anouar Y: Identification of potential gene markers and insights into the pathophysiology of pheochromocytoma malignancy. J Clin Endocrinol Metab. 2007, 92 (12): 4865-4872. 10.1210/jc.2007-1253.View ArticlePubMedGoogle Scholar
- Zagzag D, Salnikow K, Chiriboga L, Yee H, Lan L, Ali MA, Garcia R, Demaria S, Newcomb EW: Downregulation of major histocompatibility complex antigens in invading glioma cells: stealth invasion of the brain. Lab Invest. 2005, 85 (3): 328-341. 10.1038/labinvest.3700233.View ArticlePubMedGoogle Scholar
- Rotman G, Shiloh Y: ATM: from gene to function. Hum Mol Genet. 1998, 7 (10): 1555-1563. 10.1093/hmg/7.10.1555.View ArticlePubMedGoogle Scholar
- Branzei D, Foiani M: Regulation of DNA repair throughout the cell cycle. Nat Rev Mol Cell Biol. 2008, 9 (4): 297-308. 10.1038/nrm2351.View ArticlePubMedGoogle Scholar
- Smith MJ, Culhane AC, Donovan M, Coffey JC, Barry BD, Kelly MA, Higgins DG, Wang JH, Kirwan WO, Cotter TG, Redmond HP: Analysis of differential gene expression in colorectal cancer and stroma using fluorescence-activated cell sorting purification. Br J Cancer. 2009, 100 (9): 1452-1464. 10.1038/sj.bjc.6604931.View ArticlePubMedPubMed CentralGoogle Scholar
- Ohlund D, Lundin C, Ardnor B, Oman M, Naredi P, Sund M: Type IV collagen is a tumour stroma-derived biomarker for pancreas cancer. Br J Cancer. 2009, 101 (1): 91-97. 10.1038/sj.bjc.6605107.View ArticlePubMedPubMed CentralGoogle Scholar
- Youn A, Simon R: Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics. 2011, 27 (2): 175-181. 10.1093/bioinformatics/btq630.View ArticlePubMedGoogle Scholar
- Nowell PC: The clonal evolution of tumor cell populations. Science. 1976, 194 (4260): 23-28. 10.1126/science.959840.View ArticlePubMedGoogle Scholar
- Loeb LA: A mutator phenotype in cancer. Cancer Res. 2001, 61 (8): 3230-3239.PubMedGoogle Scholar
- Heng HH, Stevens JB, Liu G, Bremer SW, Ye KJ, Reddy PV, Wu GS, Wang YA, Tainsky MA, Ye CJ: Stochastic cancer progression driven by non-clonal chromosome aberrations. J Cell Physiol. 2006, 208 (2): 461-472. 10.1002/jcp.20685.View ArticlePubMedGoogle Scholar
- Hanahan D, Weinberg RA: Hallmarks of cancer: the next generation. Cell. 2011, 144 (5): 646-674. 10.1016/j.cell.2011.02.013.View ArticlePubMedGoogle Scholar
- Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, McDonald NQ, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA, Swanton C: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012, 366 (10): 883-892. 10.1056/NEJMoa1113205.View ArticlePubMedPubMed CentralGoogle Scholar
- The University of California, Santa Cruz Bioinformatics Site:http://genome.ucsc.edu/,
- Agilent Technologies:http://www.genomics.agilent.com,
- The pcaMethods v1.12.0:http://rss.acs.unt.edu/Rdoc/library/pcaMethods/,
- The Perl Package (Left):http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/Fisher/left.pm,
- The Perl Package (Right):http://search.cpan.org/dist/Text-NSP/lib/Text/NSP/Measures/2D/Fisher/right.pm,
- The UniProtKB and Swiss-Prot Database:http://www.uniprot.org/,
- The Kyoto Encyclopedia of Genes and Genomes (KEGG) Database:http://www.genome.jp/kegg/,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.