Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms

Orlando Iparraguirre-Villanueva; Melquiades Melgarejo-Graciano; Gloria Castro-Leon; Sandro  Olaya-Cotera; John Ruiz-Alvarado; Andrés Epifanía-Huerta; Michael Cabanillas-Carbonell; Joselyn Zapata-Paulini

doi:10.3991/ijim.v17i14.39907

Authors

Orlando Iparraguirre-Villanueva Universidad Autónoma del Perú
Melquiades Melgarejo-Graciano Universidad Científica del Sur https://orcid.org/0000-0002-1340-1167
Gloria Castro-Leon Universidad Nacional Tecnológica de Lima Sur https://orcid.org/0000-0002-8386-2006
Sandro Olaya-Cotera Universidad San Ignacio de Loyola
John Ruiz-Alvarado Universidad Tecnológica del Perú https://orcid.org/0000-0002-3258-6347
Andrés Epifanía-Huerta Universidad Católica los Ángeles de Chimbote https://orcid.org/0000-0002-6643-1829
Michael Cabanillas-Carbonell Universidad Privada del Norte https://orcid.org/0000-0001-9675-0970
Joselyn Zapata-Paulini Universidad Continental

DOI:

https://doi.org/10.3991/ijim.v17i14.39907

Keywords:

classification, tweets, disasters, machine learning, natural

Abstract

Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims to identify, classify and analyze tweets related to real natural disasters through tweets with the hashtag #NaturalDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geolocated tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.

Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Rankings

Other journals