Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
DOI:
https://doi.org/10.3991/ijim.v17i14.39907Keywords:
classification, tweets, disasters, machine learning, naturalAbstract
Identifying and classifying text extracted from social networks, following the traditional method, is very complex. In recent years, computer science has advanced exponentially, helping significantly to identify and classify text extracted from social networks, specifically Twitter. This work aims to identify, classify and analyze tweets related to real natural disasters through tweets with the hashtag #NaturalDisasters, using Machine learning (ML) algorithms, such as Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF). First, tweets related to natural disasters were identified, creating a dataset of 122k geolocated tweets for training. Secondly, the data-cleaning process was carried out by applying stemming and lemmatization techniques. Third, exploratory data analysis (EDA) was performed to gain an initial understanding of the data. Fourth, the training and testing process of the BNB, MNB, L, KNN, DT, and RF models was initiated, using tools and libraries for this type of task. The results of the trained models demonstrated optimal performance: BNB, MNB, and LR models achieved a performance rate of 87% accuracy; and KNN, DT, and RF models achieved performances of 82%, 75%, and 86%, respectively. However, the BNB, MNB, and LR models performed better with respect to performance on their respective metrics, such as processing time, test accuracy, precision, and F1 score. Demonstrating, for this context and with the trained dataset that they are the best in terms of text classifiers.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Orlando Iparraguirre-Villanueva, Michael Cabanillas-Carbonell, Efraín Melgarejo-Graciano, Gloria Castro-Leon, Sandro Olaya Cotera, John Ruiz-Alvarado, Andrés Epifanía-Huerta, Joselyn Zapata-Paulini
This work is licensed under a Creative Commons Attribution 4.0 International License.
The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY What does this mean?). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.
By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.
This journal has been awarded the SPARC Europe Seal for Open Access Journals (What's this?)