BioBERT-XGBoost for Adverse Drug Reaction Prediction: An Interpretable Hybrid Model for Risk-Aware Pharmacovigilance
DOI:
https://doi.org/10.3991/ijoe.v22i04.59703Keywords:
adverse drug reactions, pharmacovigilance, biomedical NLP, hybrid machine learning, clinical prediction models, semantic embeddings, risk stratification, patient safety systemsAbstract
Adverse drug reactions (ADRs) are a critical challenge for patient safety, with over 21,000 alerts reported in Peru in 2024. Current artificial intelligence (AI) models in pharmacovigilance present limitations in external validation, clinical scalability, and algorithmic transparency. This work proposes BioBERT-XGBoost, an interpretable hybrid model that combines biomedical natural language processing with supervised machine learning to predict ADRs. The architecture integrates BioBERT for semantic extraction of pharmacological entities with XGBoost as a calibrated classifier, trained on public datasets (DrugBank, openFDA–FAERS) and anonymized clinical records. The pipeline includes standardized preprocessing through normalized vocabularies, feature engineering with semantic embeddings, class imbalance handling, and probability calibration. Evaluation uses discrimination metrics (AUROC, AUPRC), calibration (Brier score), and explainability (SHAP). The system is deployed on Microsoft Azure through a mobile application that generates risk-stratified clinical alerts, representing a step toward trustworthy clinical decision-support systems for proactive ADR detection.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Alexandra Ramirez, Raúl Pingo , Sandra Wong-Durand, Pedro Castañeda, Alejandra Oñate-Andino

This work is licensed under a Creative Commons Attribution 4.0 International License.

