Enhanced Alzheimer’s Diagnosis Using Multimodal Data: A Comparative Study of CNN Architectures
DOI:
https://doi.org/10.3991/ijoe.v21i14.57471Keywords:
Azheimer's, Classification, CNN Architecture, multimodalAbstract
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that leads to severe cognitive decline, making early and accurate diagnosis essential for effective patient management. Traditional diagnostic approaches often rely on unimodal imaging data such as magnetic resonance imaging (MRI) or positron emission tomography (PET) scans, yet these methods are insufficient to capture the heterogeneous and multimodal characteristics of AD biomarkers. Similarly, conventional convolutional neural network (CNN) models demonstrate strong image recognition capabilities but remain limited in integrating diverse sources of clinical evidence. To address this gap, this study proposes an enhanced multimodal diagnostic framework that combines MRI, PET, and clinical metadata to provide a more holistic representation of disease progression. The framework is evaluated through a comparative analysis of state-of-the-art CNN architectures, including densely connected convolutional networks (DenseNet), residual network (ResNet), InceptionNet, VGG, MobileNet, and EfficientNet. Experiments were conducted on the ADNI dataset, which includes 372 subjects and a total of 63,777 image slices. The results clearly demonstrate that multimodal models outperform unimodal counterparts in classification performance. EfficientNetB1 emerged as the best-performing model, achieving 98.6% accuracy, precision, recall, and F1-score, highlighting the significant contribution of clinical metadata integration. However, this superior accuracy comes with higher computational requirements, as EfficientNetB1 required 22.16 seconds for prediction with a memory load of 31.6 GB. In contrast, lightweight models such as MobileNet offered faster inference speeds but sacrificed accuracy, reaching only about 76%. These findings emphasize the critical trade-off between computational efficiency and diagnostic performance in real-world clinical scenarios. Overall, the study provides strong evidence that multimodal CNN-based architectures offer robust and accurate tools for AD detection, while also underscoring the need to balance model complexity with resource constraints in practical healthcare implementation.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Lailil Muflikhah, Galih Restu Baihaqi, Shafatyra Shalsadilla, Achmad Ridok, Sri Soenarti

This work is licensed under a Creative Commons Attribution 4.0 International License.

