- International Scientific and Vocational Studies Journal
- Vol: 2 Issue: 2
- A Comparison of Five Methods for Missing Value Imputation in Data Sets
A Comparison of Five Methods for Missing Value Imputation in Data Sets
Authors : Pınar Cihan
Pages : 80-85
View : 6 | Download : 3
Publication Date : 2018-12-31
Article Type : Research
Abstract :The missing values in the data sets do not allow for accurate analysis. Therefore, the correct imputation of missing values has become the focus of attention of researchers in recent years. This paper focuses on a comparison of most reliable and up to date estimation methods to imputing the missing values. Imputation of missing values has a very high priority because of its impact on next pre-processing, data analysis, classification, clustering, etc. Root mean square error (RMSE) value, classification accuracy and execution time are used to evaluate the performances of most popular five methods (mean, k-nearest neighbors, singular value decomposition, bayesian principal component analysis and missForest). When RMSE and classification accuracy values of methods were compared, it has observed that missForest method outperformed other methods in all datasets.Keywords : Missing value imputation, k-nearest neighbor, singular value decomposition, bayesian principal component analysis, missForest