Enhancing Automated Medical Report Generation

A Method Based on Semantic-Guidance and Dual Stage Alignment

Authors

  • Fatima Cheddi Mohammed V University in Rabat, Rabat, Morocco https://orcid.org/0009-0005-8242-9605
  • Ahmed Habbani Mohammed V University in Rabat, Rabat, Morocco
  • Hammadi Nait-Charif Bournemouth University, Poole, UK

DOI:

https://doi.org/10.3991/ijoe.v21i12.56289

Keywords:

Medical image, Automated Report generation, Cross-modal, Deep learning, Contrastive learning, Semantic Knowledge

Abstract


The increased availability of multimodal data in healthcare, particularly in clinical diagnosis, can improve diagnostic accuracy, patient outcomes, and support more effective clinical decision-making. However, previous methods face several challenges, including achieving effective cross-modal alignment between textual descriptions and visual data, missing small and rare lesions, imprecise diagnostic terminology, and difficulty in extracting and utilizing semantic knowledge. To address these issues, we propose a new framework named semanticguided hierarchical feature extraction and cycle-consistent fusion (SHECoF) for automatic chest X-ray (CXR) report generation, based on supervised and unsupervised learning algorithms. Our model introduces a novel dual-alignment strategy to progressively bridge the modality gap. It first incorporates hierarchical feature extraction and semantic knowledge extraction (SKE) mechanisms from the report, guiding the model to focus on fine-grained lesion detection in the visual extraction process. Subsequently, a second, deep alignment is performed by our cycle-consistent cross-attention fusion (C3F) mechanism, which enforces a bidirectional, cycle-consistent loss, establishing a fine-grained correspondence between image regions and textual descriptions. Validation of our approach in comparisons with existing methods indicates a corresponding boost in report quality in terms of clinical accuracy of the description, localization of the lesion, and contextual consistency, positioning our framework as a robust tool for generating more accurate and reliable medical reports.

Downloads

Published

2025-10-10

How to Cite

Cheddi, F., Habbani, A., & Nait-Charif, H. (2025). Enhancing Automated Medical Report Generation: A Method Based on Semantic-Guidance and Dual Stage Alignment. International Journal of Online and Biomedical Engineering (iJOE), 21(12), 42–62. https://doi.org/10.3991/ijoe.v21i12.56289

Issue

Section

Papers