Image Captioning for Medical Surveillance in Smart Home Environments Using Vision Transformers
DOI:
https://doi.org/10.3991/ijoe.v21i05.54331Keywords:
vision transformers, medical surveillance, image captioning, smart healthcare, AI in healthcareAbstract
Medical surveillance in smart homes represents a transformative approach to patient care by utilizing advancements in computer vision to monitor and analyze patient behavior continuously. This study builds upon previous research by fine-tuning vision transformer (ViT) neural networks with a curated dataset that includes diverse scenarios of patients in both normal and abnormal conditions. The proposed model generates descriptive captions from surveillance camera images, effectively capturing contextual information and identifying potential medical indicators. These insights are integrated into an automated notification system designed to alert healthcare providers promptly, enabling timely and informed interventions. To evaluate the effectiveness of the approach, the fine-tuned ViT model is compared against traditional convolutional neural networks (CNNs) state-of-the-art model, demonstrating superior performance with an accuracy of 87.2%, a BLEU-4 score of 0.351, and a ROUGE-2 score of 0.591. These results highlight the model’s ability to generate accurate and contextually relevant captions, outperforming CNN-LSTM baselines in accuracy, robustness, and contextual understanding. The findings underscore the critical role of artificial intelligence (AI) in detecting changes in patient conditions and providing personalized care through real-time monitoring. This proof-of-concept highlights the feasibility of deploying AI-driven solutions in medical surveillance systems, paving the way for innovative healthcare technologies. By addressing key challenges in patient monitoring, the study establishes ViT as a reliable and scalable tool for enhancing the quality and efficiency of healthcare delivery in smart home environments.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Lamiae Eloutouate, Hicham Gibet Tani, Fatiha Elouaai, Mohammed Bouhorma, Mohamed Walid Hajoub

This work is licensed under a Creative Commons Attribution 4.0 International License.

