Comparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models

dc.contributor.authorZamri, Nurul Aqilah
dc.contributor.authorJaya, M. Izham
dc.contributor.authorIrawati, Indrarini Dyah
dc.contributor.authorRassem, Taha H.
dc.contributor.authorRasyidah
dc.contributor.authorKasim, Shahreen
dc.date.acceptance2024-08-19
dc.date.accessioned2024-10-23T13:23:49Z
dc.date.available2024-10-23T13:23:49Z
dc.date.issued2024-09-25
dc.descriptionopen access article International Matching Grant with Project ID UIC241510 from the Universiti Malaysia Pahang Al-Sultan Abdullah (RDU242708).
dc.description.abstractThe presence of missing values within datasets can introduce a detrimental bias, significantly impeding the predictive algorithm's ability to discern patterns and accurately execute prediction. This paper aims to elucidate the intricacies of data imputation methods, providing a more profound understanding of prevalent imputation methods, including list-wise deletion (IGN), mean imputation (AVG), K-Nearest Neighbors (KNN), MissForest (MF), and Predictive Mean Matching (PMM). The dataset employed in this study consists of financial data about S&P 500 companies in the Compustat North America database. The training and validation dataset encompasses 1973 instances, consisting of data during the fourth quarter of 2009, the first quarter of 2010, and the third quarter of 2014. Within this set, 457 missing values were identified and imputed. The test dataset comprises 197 randomly selected instances from the fourth quarter of 2014, equivalent to ten percent of the total instances in the training dataset. The evaluation findings prominently position the dataset derived from MF imputation as the leading performer among all the imputed datasets. The insights derived from this study are intended to assist practitioners in making informed choices when selecting the most suitable data imputation method, particularly in the context of predictive modeling tasks.
dc.funderOther external funder (please detail below)
dc.identifier.citationZamri, N.A. et al. (2024) Comparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models. International Journal of Informatics Visualization, 8 (3)
dc.identifier.doihttps://doi.org/10.62527/joiv.8.3.1666
dc.identifier.issn2549-9904
dc.identifier.issn2549-9610
dc.identifier.urihttps://hdl.handle.net/2086/24357
dc.language.isoen
dc.peerreviewedYes
dc.projectidInternational Matching Grant with Project ID UIC241510 from the Universiti Malaysia Pahang Al-Sultan Abdullah (RDU242708).
dc.publisherSociety of Visual Informatics
dc.relation.ispartofJOIV : International Journal on Informatics Visualization
dc.researchinstitute.instituteInstitute of Digital Research, Communication and Responsible Innovation
dc.rightsAttribution-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/
dc.titleComparative Analysis of Imputation Methods for Enhancing Predictive Accuracy in Data Models
dc.typeArticle
oaire.citation.issue3
oaire.citation.volume8

Files

License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.2 KB
Format:
Item-specific license agreed upon to submission
Description: