Image Captioning for Medical Surveillance in Smart Home Environments Using Vision Transformers

Lamiae Eloutouate; Hicham Gibet Tani; Fatiha Elouaai; Mohammed Bouhorma; Mohamed Walid Hajoub

doi:10.3991/ijoe.v21i05.54331

Authors

Lamiae Eloutouate Abdelmalek Essaadi University, Tetouan, Morocco https://orcid.org/0000-0002-8279-9068
Hicham Gibet Tani Abdelmalek Essaadi University, Tetouan, Morocco https://orcid.org/0000-0002-6310-8444
Fatiha Elouaai Abdelmalek Essaadi University, Tetouan, Morocco https://orcid.org/0000-0002-7139-5682
Mohammed Bouhorma Abdelmalek Essaadi University, Tetouan, Morocco https://orcid.org/0000-0002-5687-5231
Mohamed Walid Hajoub Abdelmalek Essaadi University, Tetouan, Morocco https://orcid.org/0009-0005-1353-5527

DOI:

https://doi.org/10.3991/ijoe.v21i05.54331

Keywords:

vision transformers, medical surveillance, image captioning, smart healthcare, AI in healthcare

Abstract

Medical surveillance in smart homes represents a transformative approach to patient care by utilizing advancements in computer vision to monitor and analyze patient behavior continuously. This study builds upon previous research by fine-tuning vision transformer (ViT) neural networks with a curated dataset that includes diverse scenarios of patients in both normal and abnormal conditions. The proposed model generates descriptive captions from surveillance camera images, effectively capturing contextual information and identifying potential medical indicators. These insights are integrated into an automated notification system designed to alert healthcare providers promptly, enabling timely and informed interventions. To evaluate the effectiveness of the approach, the fine-tuned ViT model is compared against traditional convolutional neural networks (CNNs) state-of-the-art model, demonstrating superior performance with an accuracy of 87.2%, a BLEU-4 score of 0.351, and a ROUGE-2 score of 0.591. These results highlight the model’s ability to generate accurate and contextually relevant captions, outperforming CNN-LSTM baselines in accuracy, robustness, and contextual understanding. The findings underscore the critical role of artificial intelligence (AI) in detecting changes in patient conditions and providing personalized care through real-time monitoring. This proof-of-concept highlights the feasibility of deploying AI-driven solutions in medical surveillance systems, paving the way for innovative healthcare technologies. By addressing key challenges in patient monitoring, the study establishes ViT as a reliable and scalable tool for enhancing the quality and efficiency of healthcare delivery in smart home environments.

Image Captioning for Medical Surveillance in Smart Home Environments Using Vision Transformers

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Rankings

Other journals