Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data

Authors

DOI:

https://doi.org/10.3991/ijoe.v18i02.25047

Keywords:

High-dimensional data, feature selection, missing data, multiple imputations, penalized regression.

Abstract


Analysis without adequate handling of missing values may lead to inconsistent and biased estimates. Despite multiple imputations becoming a widely used approach in handling missing data, manuscript researchers generally encounter missing data in their respective studies. In high-dimensional data, penalized regression is a popular technique for performing feature selection and coefficient estimation simultaneously. However, one of the most vital issues with high-dimensional data is that it often contains large quantities of missing data that common multiple imputation approaches may not work correctly. Therefore, this study uses imputations penalized regression models as an extension of the penalized methods to improve the performance and impute missing values in high-dimensional data. The method was applied to real-life high-dimensional datasets for the different number of features, sample sizes, and missing dataset rates to evaluate its efficiency. The method was also compared with other existing imputation penalized methods for high-dimensional data. The comparative experimental results indicate that the proposed method outperforms its competitors by achieving higher sensitivity, specificity, and classification accuracy values.

Author Biographies

Aiedh Mrisi Alharthi, Department of Mathematical Sciences, Universiti Teknologi Malaysia, Skudai, Malaysia And, Department of Mathematics, Taif University, Taif, Saudi Arabia

Aiedh Mrisi Alharthi is a Ph.D. candidate in statistics at the Department of Mathematical Sciences, Universiti Teknologi Malaysia (UTM). He received his M.Sc. degree from the Department of Mathematics and Statistics, Taif University, Saudi Arabia. His research interests are high-dimensional data, penalized (regularized) methods, gene selection in cancer classification, and missing values.

Muhammad Hisyam Lee, Department of Mathematical Sciences, Universiti Teknologi Malaysia, Skudai, Malaysia

Muhammad Hisyam Lee is a Professor of Statistics in the Department of Mathematical Sciences, Faculty of Sciences, Universiti Teknologi Malaysia (UTM). He served as Vice President of the Malaysia Institute of Statistics between 2010 and 2014. Currently, he is serving as the Manager of Information Technology in the office of the Deputy Vice-Chancellor, Academic and International, UTM. His research interests include forecasting, time series analysis, and statistical quality control.

Zakariya Yahya Algamal, Department of Statistics and Informatics, University of Mosul, Mosul, Iraq

Zakariya Yahya Algamal received the B.S. degree in Statistics from University of Mosul, Mosul, Iraq, in 2001, the M.S. degree in Statistics from University of Mosul, Mosul, Iraq, in 2004, the Ph.D. degree in Mathematical Science/Statistics from Universiti Teknologi Malaysia (UTM), Malaysia, in 2016, and Post. Doctorate in Mathematical Science/Statistics from Universiti Teknologi Malaysia (UTM), Malaysia, in 2017. He is a professor in Statistics at University of Mosul. His research interest includes the development of high dimensional data, generalized linear models, bioinformatics, chemoinformatics, and optimization algorithms.

Downloads

Published

2022-02-16

How to Cite

Alharthi, A. M., Lee, M. H., & Algamal, Z. Y. (2022). Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data. International Journal of Online and Biomedical Engineering (iJOE), 18(02), pp. 40–54. https://doi.org/10.3991/ijoe.v18i02.25047

Issue

Section

Papers