Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;8(12):e1002838.
doi: 10.1371/journal.pcbi.1002838. Epub 2012 Dec 20.

PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions

Affiliations

PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions

Wenlian Qiao et al. PLoS Comput Biol. 2012.

Abstract

The cellular composition of heterogeneous samples can be predicted using an expression deconvolution algorithm to decompose their gene expression profiles based on pre-defined, reference gene expression profiles of the constituent populations in these samples. However, the expression profiles of the actual constituent populations are often perturbed from those of the reference profiles due to gene expression changes in cells associated with microenvironmental or developmental effects. Existing deconvolution algorithms do not account for these changes and give incorrect results when benchmarked against those measured by well-established flow cytometry, even after batch correction was applied. We introduce PERT, a new probabilistic expression deconvolution method that detects and accounts for a shared, multiplicative perturbation in the reference profiles when performing expression deconvolution. We applied PERT and three other state-of-the-art expression deconvolution methods to predict cell frequencies within heterogeneous human blood samples that were collected under several conditions (uncultured mono-nucleated and lineage-depleted cells, and culture-derived lineage-depleted cells). Only PERT's predicted proportions of the constituent populations matched those assigned by flow cytometry. Genes associated with cell cycle processes were highly enriched among those with the largest predicted expression changes between the cultured and uncultured conditions. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic of deconvolution models.
(A) Generation of mixed profiles from heterogeneous samples. (A-i) represents a heterogeneous sample whose composition is unknown. Each bar in (A-ii) represents individual gene expression levels of the heterogeneous sample. (B) Schematic of four deconvolution models. (B-i) The non-negative least squares model (NNLS) (Lawson and Hanson (1995)) and the non-negative maximum likelihood model (NNML) predict proportions of pre-specified reference populations in a heterogeneous sample using mixed and reference profiles. (B-ii) The non-negative maximum likelihood new population model (NNMLnp) estimates the gene expression profile of a new reference population that may exist in a heterogeneous sample; simultaneously, the model predicts proportions of both input reference populations and the new reference population. (B-iii) The perturbation model (PERT) perturbs the input reference profiles using a genome-wide perturbation vector ρ; simultaneously, the model predicts proportions of the reference populations in a heterogeneous sample. Parameters shown in red are model predicted.
Figure 2
Figure 2. NNML recovers known compositions of immune cell line mixtures.
Microarray data of IM-9 (○), Jurkat (▵), Raji (□), THP-1 (+), and the mixtures of these four cell lines in known proportions were obtained from Abbas et al. (2009). Proportions of each cell line were predicted using (A) NNLS with cell line signature probes (reproduced from Abbas et al. (2009)), (B) NNLS without cell line signature probe, (C) NNML with cell line signature probes, and (D) NNLS without cell line signature probes. Model predictions were compared with the input proportions used to create the mixtures. Cell line signature probes were obtained from Abbas et al. (2009).
Figure 3
Figure 3. PERT captures cell culture effects.
(A) Experimental setup for profiling genome-wide transcriptome expression of uncultured (day-0) and culture-derived (day-4) colony forming unit-monocytes (CFU-M) and megakaryocytes (MEGA). Lin-: lineage-depleted cells; TPO: thrombopoietin; SCF: stem cell factor; FLT3LG: fms-related tyrosine kinase 3 ligand. (B) Pearson's correlation comparison between day-0 and day-4 samples. (C) Plots of Gene Ontology enrichment analysis showing the enrichment scores of cell cycle phase genes, immune response genes, and inflammatory response genes by day-4 samples compared with day-0 samples. NES denotes the normalized enrichment score. P-values (P) were calculated using the hypergeometric test. (D) Pearson's correlation comparison between day-0 CFU-M, day-4 CFU-M, and perturbed day-0 CFU-M (or model predicted day-4 CFU-M) gene expression profiles. (E) Pearson's correlation comparison between day-0 megakaryocyte, day-4 megakaryocyte, and perturbed day-0 megakaryocyte (or model predicted day-4 megakaryocyte) gene expression profiles.
Figure 4
Figure 4. PERT recovers compositions of uncultured human cord blood mono-nucleated and lineage-depleted (Lin-) cells.
(A) Schematic compositions of mono-nucleated cell samples and Lin- cell samples. (B) Model predicted proportions of 11 homogeneous blood cell lineages, namely granulocytes (GRAN), erythrocytes (ERY), monocytes (MONO), precursor B cells (PREB), megakaryocyte-erythrocyte progenitors (MEP), megakaryocytes (MEGA), primitive progenitor cells (PPC), eosinophils (EOS), granulocyte-monocyte progenitors (GMP), common myeloid progenitors (CMP), and basophils (BASO) in uncultured human mono-nucleated cord blood cell samples. (C) Flow cytometry measured proportions of the 11 blood cell lineages in the uncultured human mono-nucleated cord blood cell samples shown in (B). (D) Model predicted proportions in uncultured human Lin- cord blood cell samples. (E) Flow cytometry measured proportions in the uncultured human Lin- cord blood cell samples shown in (D). (F) R2 calculated from the Pearson's correlation coefficients between the model predicted cell proportions and the ones assigned by flow cytometry. See Table 2 for the associated t-statistics and P-values. (G) Averaged absolute differences of model predicted cell proportions. Error bars show standard deviations of the absolute differences between model predicted and flow cytometry assigned proportions of the 11 blood cell lineages. (H) The Bayesian information criterion (BIC) calculated from the parameters in Table 1.
Figure 5
Figure 5. PERT recovers compositions of culture-derived lineage-depleted (Lin-) human blood cells.
(A) Schematic of experiment setup. (B) Model predicted cell proportions of 11 blood cell lineages (defined in Figure 4) in day-4 Lin- human blood cell samples. (C) Flow cytometry assigned averaged cell proportions (N = 3) in the day-4 Lin- human blood cell samples shown in (B). (D) R2 calculated from the Pearson's correlation coefficients between the model predicted cell proportions and the ones assigned by flow cytometry. (E) Averaged absolute differences of model predicted cell proportions. Error bars show standard deviations of the absolute differences of the 11 blood cell lineages. (F) The Bayesian information criterion (BIC) calculated from the parameters in Table 1.

References

    1. Lu P, Nakorchevskiy A, Marcotte EM (2003) Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations. Proc Natl Acad Sci USA 100: 10370–10375. - PMC - PubMed
    1. Venet D, Pecasse F, Maenhaut C, Bersini H (2001) Separation of samples into their constituents using gene expression data. Bioinformatics 17 Suppl 1: S279–87. - PubMed
    1. Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z, Clark HF (2009) Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS ONE 4: e6098 doi:10.1371/journal.pone.0006098. - DOI - PMC - PubMed
    1. Gong T, Hartmann N, Kohane IS, Brinkmann V, Staedtler F, et al. (2011) Optimal deconvolution of transcriptional profiling data using quadratic programming with application to complex clinical blood samples. PLoS ONE 6: e27156 doi:10.1371/journal.pone.0027156. - DOI - PMC - PubMed
    1. Quon G, Morris Q (2009) ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing. Bioinformatics 25: 2882–2889 doi:10.1093/bioinformatics/btp378. - DOI - PMC - PubMed

Publication types