Predictive Analysis of Vector-Borne Diseases through Tabular Classification of Epidemiological Data

Orlando Iparraguirre-Villanueva; Michael Cabanillas-Carbonell

doi:10.3991/ijoe.v20i13.50437

Authors

Orlando Iparraguirre-Villanueva Universidad Tecnológica del Perú https://orcid.org/0000-0001-8185-2034
Michael Cabanillas-Carbonell Universidad Privada del Norte https://orcid.org/0000-0001-9675-0970

DOI:

https://doi.org/10.3991/ijoe.v20i13.50437

Keywords:

prediction, machine learning, evaluation, Models, Vector-Borne, Diseases

Abstract

Vector-borne diseases (VBDs) are major threats to human health. They are estimated to cause more than 700,000 deaths each year. This presents serious health problems for CBD. In recent years, the incidence of VBDs has increased globally, affecting one billion people approximately and accounting for 17% of all infectious diseases. Globally, disease rates have risen at an alarming rate, with more than 3.9 billion people at risk of infection. Therefore, it is essential to find approaches to detect these diseases; this is where machine learning (ML) models come into play. The purpose of this study was to predict VBDs using tabular epidemiological data. For this purpose, a set of ML models was used, such as support vector classifier (SVC), extreme gradient boosting (XGBoost), LightGBM, CatBoost, random forest (RF), and balanced random forest (BRF). A dataset consisting of 65 features and 1262 records was used during the training stage. The results highlighted the successful integration of the different models, such as SVC, XGBoost, LightGBM, CatBoost, BRF, and RF, with weights of 0.49959 ± 0.27112, 0.58496 ± 0.22619, 0.48482 ± 0.29971, 0.54992 ± 0.27982, 0.24924 ± 0.22654, and 0.45592 ± 0.25849. In addition, the BRF model stood out for having the lowest log loss, evaluated through the ensemble log-loss metric, with an average of 0.24924 and a standard deviation of 0.22654.

Author Biography

Orlando Iparraguirre-Villanueva, Universidad Tecnológica del Perú

Systems Engineer with a Master's Degree in Information Technology Management, PhD in Systems Engineering from Universidad Nacional Federico Villarreal - Peru. ITIL® Foundation Certificate in IT Service, Specialization in Business Continuity Management, Scrum Fundamentals Certification (SFC). National and international speaker/panelist (Panama, Colombia, Ecuador, Venezuela, Mexico). Undergraduate and postgraduate teacher in different universities in the country. Advisor and jury of thesis in different universities. Consultant in information technologies in public and private institutions. Coordinator, director in different private institutions. Specialist in software development, IoT, Business Intelligence, open source software, Augmented Reality, Machine Learning, text mining and virtual environments.