Show simple item record

dc.contributor.authorFahy, Conor
dc.contributor.authorYang, Shengxiang
dc.date.accessioned2020-01-07T09:52:55Z
dc.date.available2020-01-07T09:52:55Z
dc.date.issued2019-07-31
dc.identifier.citationFahy, C. and Yang, S. (2019) Dynamic feature selection for clustering high dimensional data streams. IEEE Access, 7(1), 127128-127140.en
dc.identifier.urihttps://dora.dmu.ac.uk/handle/2086/18989
dc.descriptionopen access articleen
dc.description.abstractChange in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of “density” difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked.en
dc.language.isoen_USen
dc.publisherIEEE Pressen
dc.subjectData stream clusteringen
dc.subjectdynamic feature selectionen
dc.subjectfeature driften
dc.subjectfeature evolutionen
dc.subjectunsupervised feature selectionen
dc.titleDynamic feature selection for clustering high dimensional data streamsen
dc.typeArticleen
dc.identifier.doihttps://doi.org/10.1109/access.2019.2932308
dc.peerreviewedYesen
dc.funderOther external funder (please detail below)en
dc.projectid61673331en
dc.cclicenceN/Aen
dc.date.acceptance2019-07-18
dc.exception.reasonI remember that I have submitted this paper under DORA in September, 2019. But, recently I checked my papers under DORA and found the paper is missing from my DORA deposit. So, I here re-deposit the paper under DORA.en
dc.researchinstituteInstitute of Artificial Intelligence (IAI)en
dc.funder.otherNational Natural Science Foundation of China (NSFC)en


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record