Scarcity of labels in non-stationary data streams: A survey

Date

2021

Advisors

Journal Title

Journal ISSN

ISSN

0360-0300

Volume Title

Publisher

ACM Press

Type

Article

Peer reviewed

Yes

Abstract

In a dynamic stream there is an assumption that the underlying process generating the stream is non-stationary and that concepts within the stream will drift and change as the stream progresses. Concepts learned by a classification model are prone to change and non-adaptive models are likely to deteriorate and become ineffective over time. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or might never be available) or in situations where manually annotating data points is prohibitively expensive. In a high-velocity stream it is perhaps impossible to manually label every incoming point and pursue a fully-supervised approach. In this article we formally describe the types of change which can occur in a data-stream and then catalogue the methods for dealing with change when there is limited access to labels. We present an overview of the most influential ideas in the field along with recent advancements and we highlight trends, research gaps, and future research directions.

Description

The file attached to this record is the author's final peer reviewed version.

Keywords

Concept drift, Concept evolution, Weakly supervised, Active learning, Unsupervised learning, Semi-supervised learning

Citation

Fahy, c., Yang, S. and Gongora, M. (2021) Scarcity of labels in non-stationary data streams: A survey. ACM Computing Surveys.

Rights

Research Institute