Multiobjective deep clustering and its applications in single-cell RNA-seq data

Date

2021-09-21

Advisors

Journal Title

Journal ISSN

ISSN

2168-2216

Volume Title

Publisher

IEEE Press

Type

Article

Peer reviewed

Yes

Abstract

Single-cell RNA sequencing is a transformative technology that enables us to study the heterogeneity of the tissue at the cellular level. Clustering is used as the key computational approach to group cells under the transcriptome profiles from single-cell RNA-seq data. However, accurate identification of distinct cell types is facing the challenge of high dimensionality, and it could cause uninformative clusters when clustering is directly applied on the original transcriptome. To address such challenge, an evolutionary multiobjective deep clustering (EMDC) algorithm is proposed to identify single-cell RNA-seq data in this study. First, EMDC removes redundant and irrelevant genes by applying the differential gene expression analysis to identify differentially expressed genes across biological conditions. After that, a deep autoencoder is proposed to project the high-dimensional data into different low-dimensional nonlinear embedding subspaces under different bottleneck layers. Then, the basic clustering algorithm is applied in those nonlinear embedding subspaces to generate some basic clustering results to produce the cluster ensemble. To lessen the unnecessary cost produced by those clusterings in the ensemble, the multiobjective evolutionary optimization is designed to prune the basic clustering results in the ensemble, unleashing its cell type discovery performance under three objective functions. Multiple experiments have been conducted on 30 synthetic single-cell RNA-seq datasets and six real single-cell RNA-seq datasets, which reveal that EMDC outperforms eight other clustering methods and three multiobjective optimization algorithms in cell type identification. In addition, we have conducted extensive comparisons to effectively demonstrate the impact of each component in our proposed EMDC.

Description

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.

Keywords

Evolutionary multiobjective deep clustering, multiobjective optimization, single-cell RNA-seq dataset

Citation

Wang, Y., Biao, C., Wong, K-C., Yang, S. and Li, X. (2021) Multiobjective deep clustering and its applications in single-cell RNA-seq data. IEEE Transactions on Systems, Man and Cybernetics: Systems.

Rights

Research Institute