BioBERT-XGBoost for Adverse Drug Reaction Prediction: An Interpretable Hybrid Model for Risk-Aware Pharmacovigilance

Authors

  • Alexandra Ramirez Universidad Peruana de Ciencias Aplicadas, Lima, Peru https://orcid.org/0009-0007-7657-6021
  • Raúl Pingo Universidad Peruana de Ciencias Aplicadas, Lima, Peru https://orcid.org/0009-0009-5409-1490
  • Sandra Wong-Durand Universidad Peruana de Ciencias Aplicadas, Lima, Peru
  • Pedro Castañeda Universidad Nacional Toribio Rodriguez de Mendoza (UNTRM), Amazonas, Peru https://orcid.org/0000-0003-1865-1293
  • Alejandra Oñate-Andino Escuela Superior Politécnica de Chimborazo (ESPOCH), Riobamba, Ecuador

DOI:

https://doi.org/10.3991/ijoe.v22i04.59703

Keywords:

adverse drug reactions, pharmacovigilance, biomedical NLP, hybrid machine learning, clinical prediction models, semantic embeddings, risk stratification, patient safety systems

Abstract


Adverse drug reactions (ADRs) are a critical challenge for patient safety, with over 21,000 alerts reported in Peru in 2024. Current artificial intelligence (AI) models in pharmacovigilance present limitations in external validation, clinical scalability, and algorithmic transparency. This work proposes BioBERT-XGBoost, an interpretable hybrid model that combines biomedical natural language processing with supervised machine learning to predict ADRs. The architecture integrates BioBERT for semantic extraction of pharmacological entities with XGBoost as a calibrated classifier, trained on public datasets (DrugBank, openFDA–FAERS) and anonymized clinical records. The pipeline includes standardized preprocessing through normalized vocabularies, feature engineering with semantic embeddings, class imbalance handling, and probability calibration. Evaluation uses discrimination metrics (AUROC, AUPRC), calibration (Brier score), and explainability (SHAP). The system is deployed on Microsoft Azure through a mobile application that generates risk-stratified clinical alerts, representing a step toward trustworthy clinical decision-support systems for proactive ADR detection.

Author Biographies

Alexandra Ramirez, Universidad Peruana de Ciencias Aplicadas, Lima, Peru

Alexandra Ramirez is an undergraduate student in Information Systems Engineering at Universidad Peruana de Ciencias Aplicadas (UPC). She is currently completing her bachelor's degree. Her research interests include machine learning, biomedical natural language processing and pharmacovigilance systems. This work represents her contribution to hybrid model development and clinical decision support systems as part of her undergraduate research project (EMAIL: u20211g190@upc.edu.pe, ORCID: https://orcid.org/0009-0007-7657-6021)

Raúl Pingo , Universidad Peruana de Ciencias Aplicadas, Lima, Peru

Raúl Pingo is an undergraduate student in Information Systems Engineering at Universidad Peruana de Ciencias Aplicadas (UPC). He is currently completing his bachelor's degree. His research interests encompass explainable artificial intelligence, predictive modeling in healthcare, and biomedical data analysis. This paper represents his contribution to the development of interpretable models for adverse drug reaction prediction (EMAIL: u202120632@upc.edu.pe, ORCID: https://orcid.org/0009-0009-5409-1490)

Sandra Wong-Durand, Universidad Peruana de Ciencias Aplicadas, Lima, Peru

Sandra Wong-Durand has a master's degree in Artificial Intelligence, a master's degree in Business Administration from ESAN with mention in Advanced Project Management, Systems Engineer from UNIFE, with specialization studies in Innovation and Leadership at the Escuela Superior de Administración y Dirección de Empresas (ESADE) - Spain, Process Improvement Management with CMMI at the Software Engineering Institute, Software Quality at UNIFE, Strategic Project Management at PM Certifica, SOA Architectures at IBM and Oracle. (EMAIL: pcsiswon@upc.edu.pe, ORCID: https://orcid.org/0000-0002-6154-2124).

Pedro Castañeda, Universidad Nacional Toribio Rodriguez de Mendoza (UNTRM), Amazonas, Peru

Pedro Castañeda obtained his Ph.D. from Universidad Nacional Mayor de San Marcos (UNMSM), Lima, Peru. He is a Full-Time Professor at the Faculty of Information Systems Engineering at Universidad Peruana de Ciencias Aplicadas (UPC), Lima, Peru. He is a RENACYT researcher certified by CONCYTEC. His research interests include machine learning, big data, health technologies, and software engineering. He has extensive experience in project management and serves as thesis advisor for undergraduate and graduate students. He has the following certifications: Project Management Professional (PMP), Scrum Certified Developer (CSD), IBM Certified Professional in Rational Unified Process, and ORACLE Certifications. Areas of Interest: Artificial Intelligence, Software Productivity, Business Intelligence, Data Analytics, Machine Learning, Software Engineering. (EMAIL: pedro.castaneda@untrm.edu.pe, ORCID: https://orcid.org/0000-0003-1865-1293).

Alejandra Oñate-Andino , Escuela Superior Politécnica de Chimborazo (ESPOCH), Riobamba, Ecuador

Alejandra Oñate-Andino holds a degree in Computer Systems Engineering from Escuela Superior Politécnica de Chimborazo (Ecuador), a Master in Network Interconnectivity from Escuela Superior Politécnica de Chimborazo (Ecuador), and a PhD in Systems Engineering and Computer Science from Universidad Mayor de San Marcos (Peru). Currently she is the Coordinator of the Software Career at the Escuela Superior Politécnica de Chimborazo (Ecuador). In addition, she is a Research Professor, with more than 15 years of experience, leading teaching, research and management processes. She has directed and participated in several research and community outreach projects. Author of several scientific articles in the area of Information Technology Governance, Business Intelligence, Information Technology Management, among others. (EMAIL: monate@espoch.edu.ec).

Downloads

Published

2026-04-10

How to Cite

Ramirez, A., Pingo , R., Wong-Durand, S., Castañeda, P., & Oñate-Andino , A. (2026). BioBERT-XGBoost for Adverse Drug Reaction Prediction: An Interpretable Hybrid Model for Risk-Aware Pharmacovigilance. International Journal of Online and Biomedical Engineering (iJOE), 22(04), pp. 107–122. https://doi.org/10.3991/ijoe.v22i04.59703

Issue

Section

Papers