Finding and tracking multi-density clusters in an online dynamic data stream

Date

2019-06-14

Advisors

Journal Title

Journal ISSN

ISSN

Volume Title

Publisher

IEEE Press

Type

Article

Peer reviewed

Yes

Abstract

Change is one of the biggest challenges in dynamic stream mining. From a data-mining perspective, adapting and tracking change is desirable in order to understand how and why change has occurred. Clustering, a form of unsupervised learning, can be used to identify the underlying patterns in a stream. Density-based clustering identifies clusters as areas of high density separated by areas of low density. This paper proposes a Multi-Density Stream Clustering (MDSC) algorithm to address these two problems; the multi-density problem and the problem of discovering and tracking changes in a dynamic stream. MDSC consists of two on-line components; discovered, labelled clusters and an outlier buffer. Incoming points are assigned to a live cluster or passed to the outlier buffer. New clusters are discovered in the buffer using an ant-inspired swarm intelligence approach. The newly discovered cluster is uniquely labelled and added to the set of live clusters. Processed data is subject to an ageing function and will disappear when it is no longer relevant. MDSC is shown to perform favourably to state-of-the-art peer stream-clustering algorithms on a range of real and synthetic data-streams. Experimental results suggest that MDSC can discover qualitatively useful patterns while being scalable and robust to noise.

Description

The file attached to this record is the author's final peer reviewed version.

Keywords

Data stream clustering, Multi-density clustering, Concept drift, Concept evolution, Swarm intelligence, Change detection

Citation

Fahy, C. and Yang, S. (2022) Finding and tracking multi-density clusters in an online dynamic data stream. IEEE Transactions on Big Data, 8 (1), pp. 178-192

Rights

Research Institute