Enhancing Automated Medical Report Generation: A Method Based on Semantic-Guidance and Dual Stage Alignment

Fatima Cheddi; Ahmed Habbani; Hammadi Nait-Charif

doi:10.3991/ijoe.v21i12.56289

Authors

Fatima Cheddi Mohammed V University in Rabat, Rabat, Morocco https://orcid.org/0009-0005-8242-9605
Ahmed Habbani Mohammed V University in Rabat, Rabat, Morocco
Hammadi Nait-Charif Bournemouth University, Poole, UK

DOI:

https://doi.org/10.3991/ijoe.v21i12.56289

Keywords:

Medical image, Automated Report generation, Cross-modal, Deep learning, Contrastive learning, Semantic Knowledge

Abstract

The increased availability of multimodal data in healthcare, particularly in clinical diagnosis, can improve diagnostic accuracy, patient outcomes, and support more effective clinical decision-making. However, previous methods face several challenges, including achieving effective cross-modal alignment between textual descriptions and visual data, missing small and rare lesions, imprecise diagnostic terminology, and difficulty in extracting and utilizing semantic knowledge. To address these issues, we propose a new framework named semanticguided hierarchical feature extraction and cycle-consistent fusion (SHECoF) for automatic chest X-ray (CXR) report generation, based on supervised and unsupervised learning algorithms. Our model introduces a novel dual-alignment strategy to progressively bridge the modality gap. It first incorporates hierarchical feature extraction and semantic knowledge extraction (SKE) mechanisms from the report, guiding the model to focus on fine-grained lesion detection in the visual extraction process. Subsequently, a second, deep alignment is performed by our cycle-consistent cross-attention fusion (C3F) mechanism, which enforces a bidirectional, cycle-consistent loss, establishing a fine-grained correspondence between image regions and textual descriptions. Validation of our approach in comparisons with existing methods indicates a corresponding boost in report quality in terms of clinical accuracy of the description, localization of the lesion, and contextual consistency, positioning our framework as a robust tool for generating more accurate and reliable medical reports.

Enhancing Automated Medical Report Generation

A Method Based on Semantic-Guidance and Dual Stage Alignment

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Rankings

Other journals