Down-sampled and Under-sampled Data sets in Feature Selective Validation (FSV)
Date
Advisors
Journal Title
Journal ISSN
ISSN
Volume Title
Publisher
Type
Peer reviewed
Abstract
Feature Selective Validation (FSV) is a heuristic method for quantifying the (dis)similarity of two data sets. The computational burden of obtaining the FSV values might be unnecessarily high if data sets with large numbers of points are used. While this may not be an important issue per se it is an important issue for future developments in FSV such as real-time processing or where multi-dimensional FSV is needed. Coupled with the issue of data set size, is the issue of data sets having ‘missing’ values. This may come about because of a practical difficulty or because of noise or other confounding factors making some data points unreliable. These issues relate to the question “what is the effect on FSV quantification of reducing or removing data points from a comparison – i.e. down- or under-sampling data?” This paper uses three strategies to achieve this from known data sets. This paper demonstrates, through a representative sample of 16 pairs of data sets, that FSV is robust to changes providing a minimum data set size of approximately 200 points is maintained. It is robust also for up to approximately 10% ‘missing’ data, providing this does not result in a continuous region of missed data.