Browsing by Author "Ishola, Olabayo"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Open Access DATA MINING AND RE-IDENTIFICATION: ANALYSIS OF DATABASE QUERY PATTERNS THAT POSE A THREAT TO ANONYMISED INFORMATION(De Montfort University, 2023-07) Ishola, OlabayoTo maintain the globally connected civilization culture in place today, a number of sectors are built on the gathering and sharing of data. Personal and sensitive data are collected and shared about the individuals using the services offered by these sectors. Data controllers rely on the robustness of anonymisation measures to keep personal and sensitive attributes in the shared dataset privacy safe. Typically, the dataset is stripped of direct identifiers such as names and National Insurance (NI) numbers, such that individuals in the dataset are not uniquely identifiable. However, details in the dataset perceived by data controllers to have no negative data privacy impact can be used by attackers to perform a re-identification attack. Such an attack uses the details shared in the dataset in conjunction with a secondary data source to rebuild a personally identifiable profile for individual(s) in the supposedly anonymised shared dataset. There have been a few publicised cases of re-identification attacks, and with the information reported about these attacks, it is unknown what constitutes a re-identification attack from a technical perspective other than its outcome. The work in this thesis explores real cases of successful re-identification attacks to analyse and build a technical profile of what re-identification entails. Using the Netflix Prize Data and the re-identification of Governor William Weld as case studies, synthetic datasets are created to represent the anonymised databases shared in each of these re-identification attack cases. An exploratory study to technically represent re-identification attacks as database queries in SQL is conducted. This involves the research performing re-identification attacks on the synthetic databases by executing a series of SQL queries. With a hypothesis that there is enough similarity in the patterns of SQL database queries that lead to re-identification attacks on anonymised databases, this research employs data mining techniques and machine learning algorithms to train classifiers to recognise re-identification patterns in SQL queries. Four classification algorithms: Multilayer Perceptron (MLP), Naive Bayes (NB), K-Nearest Neighbors (KNN), and Logistic Regression (LR) are trained in this research to recognise and predict attempts of re-identification attacks. The results of the performance evaluation and unseen data testing indicate that the MLP, Multinomial Naive Bayes (MNB), and the LR classifiers are most effective at recognising patterns of re-identification attacks. During performance evaluation, the MLP classifier achieved an accuracy of 100%, the MNB achieved 79.3% and the LR achieved 100%. The unseen data testing shows that the MLP, MNB, and LR classifiers are able to predict new instances of re-identification attack attempts 79%, 71%, and 79% of the time respectively, indicating a good generalisation performance. To the best of this research’s knowledge, the work in this thesis is the only effort to date to automate the recognition and prediction of re-identification attack attempts on anonymised databases. The novel system developed in this research can be implemented to improve the monitoring of anonymised databases in data sharing environments.Item Open Access Recognising Re-identification Attacks on Databases, by Interpreting them as SQL Queries: A Technical Study(2020-09-24) Ishola, Olabayo; Boiten, Eerke Albert; Ayesh, Aladdin; Albakri, AdhamThe more data sharing becomes prominent in the information age, the higher the risk of shared data being used in unexpected and undesirable ways. Data holders have employed anonymisation techniques as a means of data protection when they share a database. However, attackers can circumvent the protection or presumed protection offered by anonymisation, through re identi cation attacks. Datasets are where personal information live and SQL queries are the medium through which users interact with these datasets. This paper explores from a technical perspective, how the process (killchain) of executing a re-identi cation attack can be represented and recognised as a series of SQL queries. Using one of the best known re-identi cation attack cases as a scenario, this paper explores a method for recognising re-identi cation attack as SQL queries on a database.