Document Type


Publication Date



This article has been peer reviewed. It is the author’s final published version in BMC Systems Biology, Volume 12, March 2018, Article number 28.

The published version is available at . Copyright © Belokar et al.


BACKGROUND: Transcriptomic datasets often contain undeclared heterogeneity arising from biological variation such as diversity of disease subtypes, treatment subgroups, time-series gene expression, nested experimental conditions, as well as technical variation due to batch effects, platform differences in integrated meta-analyses, etc. However, current analysis approaches are primarily designed to handle comparisons between experimental conditions represented by homogeneous samples, thus precluding the discovery of underlying subphenotypes. Unsupervised methods for subtype identification are typically based on individual gene level analysis, which often result in irreproducible gene signatures for potential subtypes. Emerging methods to study heterogeneity have been largely developed in the context of single-cell datasets containing hundreds to thousands of samples, limiting their use to select contexts.

RESULTS: We present a novel analysis method, SPSNet, which identifies subtype-specific gene expression signatures based on the activity of subnetworks in biological pathways. SPSNet identifies the gene subnetworks capturing the diversity of underlying biological mechanisms, indicating potential sample subphenotypes. In the presence of extrinsic or non-biological heterogeneity (e.g. batch effects), SPSNet identifies subnetworks that are particularly affected by such variation, thus helping eliminate factors irrelevant to the biology of the phenotypes under study.

CONCLUSION: Using multiple publicly available datasets, we illustrate that SPSNet is able to consistently uncover patterns within gene expression data that correspond to meaningful heterogeneity of various origins. We also demonstrate the performance of SPSNet as a sensitive and reliable tool for understanding the structure and nature of such heterogeneity.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

PubMed ID