Enhancing Automated Medical Report Generation
A Method Based on Semantic-Guidance and Dual Stage Alignment
DOI:
https://doi.org/10.3991/ijoe.v21i12.56289Keywords:
Medical image, Automated Report generation, Cross-modal, Deep learning, Contrastive learning, Semantic KnowledgeAbstract
The increased availability of multimodal data in healthcare, particularly in clinical diagnosis, can improve diagnostic accuracy, patient outcomes, and support more effective clinical decision-making. However, previous methods face several challenges, including achieving effective cross-modal alignment between textual descriptions and visual data, missing small and rare lesions, imprecise diagnostic terminology, and difficulty in extracting and utilizing semantic knowledge. To address these issues, we propose a new framework named semanticguided hierarchical feature extraction and cycle-consistent fusion (SHECoF) for automatic chest X-ray (CXR) report generation, based on supervised and unsupervised learning algorithms. Our model introduces a novel dual-alignment strategy to progressively bridge the modality gap. It first incorporates hierarchical feature extraction and semantic knowledge extraction (SKE) mechanisms from the report, guiding the model to focus on fine-grained lesion detection in the visual extraction process. Subsequently, a second, deep alignment is performed by our cycle-consistent cross-attention fusion (C3F) mechanism, which enforces a bidirectional, cycle-consistent loss, establishing a fine-grained correspondence between image regions and textual descriptions. Validation of our approach in comparisons with existing methods indicates a corresponding boost in report quality in terms of clinical accuracy of the description, localization of the lesion, and contextual consistency, positioning our framework as a robust tool for generating more accurate and reliable medical reports.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fatima Cheddi, Ahmed Habbani, Hammadi Nait-Charif

This work is licensed under a Creative Commons Attribution 4.0 International License.

