An Efficient Multidimensional k-Anonymisation Strategy Using Self-Organising Maps
Date
Authors
Advisors
Journal Title
Journal ISSN
ISSN
DOI
Volume Title
Publisher
Type
Peer reviewed
Abstract
Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. The commercial benefits of these techniques have led to their successful adoption in different domains. However, there is an evidential concern that data mining could potentially be exploited to infer sensitive information, which raises a number of ethical issues, including those relating to privacy rights. Therefore, it is essential to enforce privacy constraints during mining processes in order to maintain a certain degree of privacy on the data to prevent inferences. As such, it is imperative to develop novel privacy techniques that can safeguard individual privacy during mining processes, while yet enabling the use of the data.
This thesis centers primarily on privacy-preserving data mining, and specifically focuses on a proposed hybrid framework that combines data transformation methods in conjunction with anonymisation algorithms to derive more data utility during mining processes. The framework encompasses Self-Organising Maps for data transformation, two prominent clustering-based k-anonymisation algorithms to guarantee a certain degree of privacy, selective privacy and data quality metrics to validate our methods, and classification tasks to study the impact of our methods in data mining.
The experiments of the study reveal that the proposed hybrid framework produces the most desirable properties for subsequent data mining. It is effective in meeting the desired privacy requirement and captures relevant patterns in microdata that are beneficial for classification tasks. In addition to this, the transformed data produced by self-organising maps conceals the input set into a 1-dimensional set of data, therefore preserving the true values of the original set. Results obtained from the experiments show that this unified approach has better overall performance in classification tasks than conventional methods.
The outcome of the study concludes that anonymisation and data mining techniques are highly interdependent. Complicating their combination is the fact that both parties are attempting to achieve contradicting objectives. Therefore, the inclusion of additional techniques serves as a viable starting point for subsequent enhancements to promote compatibility between both parties. This would lead to reduced fluctuations in data mining performance and produce more consistent outcomes. The proposed model is effective in meeting this goal of adequately satisfying a privacy requirement while enhancing utility for subsequent data mining problems.