Robust online active learning with cluster-based local drift detection for unbalanced imperfect data

Date

2024-08-05

Advisors

Journal Title

Journal ISSN

ISSN

Volume Title

Publisher

Elsevier

Type

Article

Peer reviewed

Yes

Abstract

With the rapid development of data-driven technologies, a massive amount of actual data emerges from industrial systems, forming data stream. Their data distribution may change over time and outliers may be generated as unbalanced imperfect data due to time-varying working condition, aging equipment, etc. Previous methods struggle with the dual challenges of concept drift and unbalance, however, fail to efficiently distinguishing outliers from a drift under the limited labeling budget, causing the performance degradation. To address the issue, robust online active learning with cluster-based local drift detection is proposed to classify unbalanced imperfect data stream with the above characteristics. The cluster-based local drift detection is first designed to capture a new concept and recognize the corresponding drifted regions. Following that, an improved active learning mechanism is presented to distinguish outliers from a drift, and select most valuable instances for labeling and updating ensemble classifier. Experimental results for eight synthetic and four real-world data streams show that the proposed method outperforms seven comparative methods on classification accuracy and robustness.

Description

The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.

Keywords

Imperfect data, Active learning, Concept drift, Unbalanced learning, Stream clustering, Scarcity of labels

Citation

Guo, Y. et al. (2024) Robust online active learning with cluster-based local drift detection for unbalanced imperfect data. Applied Soft Computing, 165, 112051

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/

Research Institute