Multimodal Fusion of Image and Recipe Features for Nutrient Estimation of Complementary Foods

Authors

  • Nani Purwati Universitas Diponegoro, Semarang, Indonesia; Universitas Bina Sarana Informatika, Yogyakarta, Indonesia https://orcid.org/0009-0008-6968-6393
  • R Rizal Isnanto Universitas Diponegoro, Semarang, Indonesia
  • Martha Irene Kartasurya Universitas Diponegoro, Semarang, Indonesia https://orcid.org/0000-0002-5177-233X

DOI:

https://doi.org/10.3991/ijoe.v22i04.59905

Keywords:

Multimodal Learning; Nutrition Prediction; MLP Fusion; Food Analysis; Recipe Features

Abstract


The assessment of the nutritional content of complementary foods for infants aged 6–24 months is still largely done manually, which is time-consuming and prone to errors. This study proposes an automated nutritional content prediction model based on a multimodal approach that integrates food images and recipe texts. The experiment was conducted using the ComFoodID25 dataset, which consists of 2,783 images of complementary foods, complete with information on ingredients, processing methods, and ten types of nutrients. Visual features were extracted using pre-trained ResNet50, while text features were obtained using IndoBERT, and then both modalities were combined through a multilayer perceptron (MLP) architecture. The evaluation results showed that the multimodal model produced low error values for most nutrients, with an MAE value below 1 for the majority of nutrients and an overall PMAE value of 2.55%. Additionally, the high coefficient of determination values indicates a strong correlation between the predicted and reference values. These findings suggest that the proposed multimodal approach is effective and reliable for automatically estimating the nutritional content of complementary foods and has the potential to support artificial intelligence-based complementary food monitoring and recommendation systems.

References

[1] H. Qi, B. Zhu, C.-W. Ngo, J. Chen, and E.-P. Lim, “Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion,” in ICMR 2025 - Proceedings of the 2025 International Conference on Multimedia Retrieval, 2025, pp. 1091–1099. doi: 10.1145/3731715.3733269.

[2] Z. Shao, G. Vinod, J. He, and F. Zhu, “An End-to-End Food Portion Estimation Framework Based on Shape Reconstruction from Monocular Image,” in Proceedings - IEEE International Conference on Multimedia and Expo, 2023, vol. 2023-July, pp. 942 – 947. doi: 10.1109/ICME55011.2023.00166.

[3] G. Vinod, Z. Shao, and F. Zhu, “Image Based Food Energy Estimation With Depth Domain Adaptation,” in Proceedings - 5th International Conference on Multimedia Information Processing and Retrieval, MIPR 2022, 2022, pp. 262 – 267. doi: 10.1109/MIPR54900.2022.00054.

[4] S.-T. Cheng, Y.-J. Lyu, and C. Teng, “Image-Based Nutritional Advisory System: Employing Multimodal Deep Learning for Food Classification and Nutritional Analysis,” Appl. Sci., vol. 15, no. 9, 2025, doi: 10.3390/app15094911.

[5] Z. Shao et al., “An Integrated System for Mobile Image-Based Dietary Assessment,” pp. 19–23, 2021, doi: 10.1145/3475725.3483625.

[6] K. Moumane, I. El Asri, T. Cheniguer, and S. Elbiki, “Food Recognition and Nutrition Estimation using MobileNetV2 CNN architecture and Transfer Learning,” in Proceedings - SITA 2023: 2023 14th International Conference on Intelligent Systems: Theories and Applications, 2023. doi: 10.1109/SITA60746.2023.10373725.

[7] L. Jiang, B. Qiu, X. Liu, C. Huang, and K. Lin, “DeepFood: Food Image Analysis and Dietary Assessment via Deep Model,” IEEE Access, vol. 8, pp. 47477 – 47489, 2020, doi: 10.1109/ACCESS.2020.2973625.

[8] S. Romero-Tapiador et al., “Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2025, pp. 430 – 439. doi: 10.1109/CVPRW67362.2025.00047.

[9] R. Krutik, C. Thacker, and R. Adhvaryu, “Advancements in Food Recognition: A Comprehensive Review of Deep Learning-Based Automated Food Item Identification,” in 2024 2nd International Conference on Electrical Engineering and Automatic Control, ICEEAC 2024, 2024. doi: 10.1109/ICEEAC61226.2024.10576416.

[10] E. J. Delp, Y. Han, J. He, M. Gupta, E. J. Delp, and F. Zhu, “Diffusion Model with Clustering-based Conditioning for Food Image Generation,” in MADiMa 2023 - Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management, Co-located with: MM 2023, 2023, pp. 61–69. doi: 10.1145/3607828.3617796.

[11] B. Maharana, M. K. Goyal, and A. I. Abidi, “Bridging the Gap: From Food Recognition to Accurate Weight Estimation,” in ICDT 2025 - 3rd International Conference on Disruptive Technologies, 2025, pp. 1157 – 1162. doi: 10.1109/ICDT63985.2025.10986752.

[12] Y. Han, S. K. Yarlagadda, T. Ghosh, F. Zhu, E. Sazonov, and E. J. Delp, “Improving food detection for images from a wearable egocentric camera,” in IS and T International Symposium on Electronic Imaging Science and Technology, 2021, vol. 2021, no. 8. doi: 10.2352/ISSN.2470-1173.2021.8.IMAWM-286.

[13] R. Mao, J. He, L. Lin, Z. Shao, H. A. Eicher-Miller, and F. Zhu, “Improving Dietary Assessment Via Integrated Hierarchy Food Classification,” in IEEE 23rd International Workshop on Multimedia Signal Processing, MMSP 2021, 2021. doi: 10.1109/MMSP53017.2021.9733586.

[14] S. Khawate, S. Gaikwad, Y. Davda, R. Shirbhate, P. Gham, and V. Borate, “Dietary Monitoring with Deep Learning and Computer Vision,” in 2025 International Conference on Computing Technologies and Data Communication, ICCTDC 2025, 2025. doi: 10.1109/ICCTDC64446.2025.11158839.

[15] B. Kalivaraprasad, M. V. D. Prasad, and N. K. Gattim, “Deep Learning-based Food Calorie Estimation Method in Dietary Assessment: An Advanced Approach using Convolutional Neural Networks,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 3, pp. 1044 – 1050, 2024, doi: 10.14569/IJACSA.2024.01503104.

[16] V. Van Wymelbeke-Delannoy et al., “A Cross-Sectional Reproducibility Study of a Standard Camera Sensor Using Artificial Intelligence to Assess Food Items: The FoodIntech Project,” Nutrients, vol. 14, no. 1, 2022, doi: 10.3390/nu14010221.

[17] D. Ganpisetty, C. R. Reddy, N. Ganpisetty, and J. Anitha, “Real-Time Food Detection and Nutritional Tracking Application for Personalized Health Management Using MobileNetV2,” in 8th IEEE International Conference on Computational System and Information Technology for Sustainable Solutions, CSITSS 2024, 2024. doi: 10.1109/CSITSS64042.2024.10816813.

[18] A. Peng, J. He, and F. Zhu, “Self-Supervised Visual Representation Learning on Food Images,” in IS and T International Symposium on Electronic Imaging Science and Technology, 2023, vol. 35, no. 7. doi: 10.2352/EI.2023.35.7.IMAGE-269.

[19] S. Zhang, V. Callaghan, and Y. Che, “Image-based methods for dietary assessment: a survey,” J. Food Meas. Charact., vol. 18, no. 1, pp. 727 – 743, 2024, doi: 10.1007/s11694-023-02247-2.

[20] S. Madhumitha, M. Magimaa, M. Maniratnam, and N. Neelima, “Dietary Assessment and Nutritional Analysis Using Deep Learning,” Lect. Notes Electr. Eng., vol. 844, pp. 11 – 21, 2022, doi: 10.1007/978-981-16-8862-1_2.

[21] N. Purandhar, S. Poojitha, S. M. Hussain, M. P. Chowdary, and M. Rafi, “Food Recognition and Calorie Estimation in Mixed Food Items using MobileNet,” in Proceedings of 6th International Conference on Intelligent Communication Technologies and Virtual Mobile Networks, ICICV 2025, 2025, pp. 1014 – 1019. doi: 10.1109/ICICV64824.2025.11085688.

[22] C. Kiourt, G. Pavlidis, and S. Markantonatou, “Deep Learning Approaches in Food Recognition,” in Learning and Analytics in Intelligent Systems, vol. 18, Athena Research Centre, University Campus at Kimmeria, Xanthi, 67100, Greece: Springer Nature, 2020, pp. 83–108. doi: 10.1007/978-3-030-49724-8_4.

[23] B. Shah and H. Bhavsar, “Depth-restricted convolutional neural network—a model for Gujarati food image classification,” Vis. Comput., vol. 40, no. 3, pp. 1931–1946, 2024, doi: 10.1007/s00371-023-02893-z.

[24] A. Reethika, T. Jagadesh, and M. S. Kanivarshini, “Nutrition food recognition using deep learning algorithm for physically challenged human being,” in Deep Learning for Cognitive Computing Systems: Technological Advancements and Applications, Department of Electronics and Communication Engineering, KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India: De Gruyter, 2022, pp. 113–128. doi: 10.1515/9783110750584-007.

[25] X. Pan, J. He, and F. Zhu, “Muti-Stage Hierarchical Food Classification,” in MADiMa 2023 - Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management, Co-located with: MM 2023, 2023, pp. 79–87. doi: 10.1145/3607828.3617798.

[26] X. Pan, J. He, and F. Zhu, “FMiFood: Multi-Modal Contrastive Learning for Food Image Classification,” in 2024 IEEE 26th International Workshop on Multimedia Signal Processing, MMSP 2024, 2024. doi: 10.1109/MMSP61759.2024.10743395.

[27] T. Roland and B. A. Erep, “mid-DeepLabv3 + : A Novel Approach for Image Semantic Segmentation Applied to African Food Dietary Assessments,” 2024.

[28] C.-F. Chung et al., “Opportunities to design better computer vison-assisted food diaries to support individuals and experts in dietary assessment: An observation and interview study with nutrition experts,” PLOS Digit. Heal., vol. 3, no. 11, 2024, doi: 10.1371/journal.pdig.0000665.

[29] F. S. Konstantakopoulos et al., “GlucoseML Mobile Application for Automated Dietary Assessment of Mediterranean Food,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2022, vol. 2022-July, pp. 1432–1435. doi: 10.1109/EMBC48229.2022.9871732.

[30] A. Sanatbyek et al., “A Multitask Deep Learning Model for Food Scene Recognition and Portion Estimation—the Food Portion Benchmark (FPB) Dataset,” IEEE Access, vol. 13, pp. 152033–152045, 2025, doi: 10.1109/ACCESS.2025.3603287.

[31] S. Mezgec and B. K. Seljak, “Nutrinet: A deep learning food and drink image recognition system for dietary assessment,” Nutrients, vol. 9, no. 7, 2017, doi: 10.3390/nu9070657.

[32] G. G. C. Lee et al., “Single Food Image Database: A Comprehensive High Quality Image Dataset for Food Recognition in Artificial Intelligence,” in IEEE International Conference on Electro Information Technology, 2025, pp. 383–388. doi: 10.1109/eIT64391.2025.11103677.

[33] L. Jiang, B. Qiu, X. Liu, C. Huang, and K. Lin, “DeepFood: Food Image Analysis and Dietary Assessment via Deep Model,” IEEE Access, vol. 8, pp. 47477 – 47489, 2020, doi: 10.1109/ACCESS.2020.2973625.

[34] Z. Wang et al., “Ingredient-Guided Region Discovery and Relationship Modeling for Food Category-Ingredient Prediction,” IEEE Trans. Image Process., vol. 31, pp. 5214 – 5226, 2022, doi: 10.1109/TIP.2022.3193763.

[35] W. Shao et al., “Vision-based food nutrition estimation via RGB-D fusion network,” Food Chem., vol. 424, no. February, p. 136309, 2023, doi: 10.1016/j.foodchem.2023.136309.

[36] K. Lee, “Multispectral Food Classification and Caloric Estimation Using Convolutional Neural Networks,” Foods, vol. 12, no. 17, 2023, doi: 10.3390/foods12173212.

[37] Y. Zhao, P. Zhu, Y. Jiang, and K. Xia, “Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation,” Front. Nutr., vol. 11, no. December, pp. 1–15, 2024, doi: 10.3389/fnut.2024.1469878.

[38] F. Nian, Y. Hu, Y. Gu, Z. Wu, S. Yang, and J. Shu, “Ingredient-guided multi-modal interaction and refinement network for RGB-D food nutrition assessment,” Digit. Signal Process. A Rev. J., vol. 153, no. July, p. 104664, 2024, doi: 10.1016/j.dsp.2024.104664.

[39] P. Ma et al., “Image-based nutrient estimation for Chinese dishes using deep learning,” Food Res. Int., vol. 147, 2021, doi: 10.1016/j.foodres.2021.110437.

[40] S. Salma, M. Habib, A. Tannouche, and Y. Ounejjar, “Comparative analysis of convolutional neural network architectures for poultry meat classification,” IAES Int. J. Artif. Intell., vol. 14, no. 5, pp. 3715–3723, 2025, doi: 10.11591/ijai.v14.i5.pp3715-3723.

[41] M. Sumanth, A. H. Reddy, D. Abhishek, S. V. Balaji, K. Amarendra, and P. V. V. S. Srinivas, “Deep Learning based Automated Food Image Classification,” in Proceedings - 2024 2nd International Conference on Inventive Computing and Informatics, ICICI 2024, 2024, no. Icici, pp. 103–107. doi: 10.1109/ICICI62254.2024.00026.

[42] J. Sultana, B. M. Ahmed, M. M. Masud, A. K. O. Huq, M. E. Ali, and M. Naznin, “A Study on Food Value Estimation From Images: Taxonomies, Datasets, and Techniques,” IEEE Access, vol. 11, pp. 45910 – 45935, 2023, doi: 10.1109/ACCESS.2023.3274475.

[43] D. Al-Rubaye and S. Ayvaz, “Deep Transfer Learning and Data Augmentation for Food Image Classification,” in 2022 Iraqi International Conference on Communication and Information Technologies, IICCIT 2022, 2022, pp. 125–130. doi: 10.1109/IICCIT55816.2022.10010432.

[44] A. Singla, L. Yuan, and T. Ebrahimi, “Food/non-food image classification and food categorization using pre-trained GoogLeNet model,” in MADiMa 2016 - Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, co-located with ACM Multimedia 2016, 2016, pp. 3–11. doi: 10.1145/2986035.2986039.

[45] K. V. Dalakleidi, M. Papadelli, I. Kapolos, and K. Papadimitriou, “Applying Image-Based Food-Recognition Systems on Dietary Assessment: A Systematic Review,” Adv. Nutr., vol. 13, no. 6, pp. 2590–2619, 2022, doi: 10.1093/advances/nmac078.

[46] E. Tasci, “Voting combinations-based ensemble of fine-tuned convolutional neural networks for food image recognition,” Multimed. Tools Appl., vol. 79, no. 41–42, pp. 30397–30418, 2020, doi: 10.1007/s11042-020-09486-1.

[47] Y. Han, Q. Cheng, W. Wu, and Z. Huang, “DPF-Nutrition: Food Nutrition Estimation via Depth Prediction and Fusion,” Foods, vol. 12, no. 23, 2023, doi: 10.3390/foods12234293.

[48] G. M. Farinella, D. Allegra, M. Moltisanti, F. Stanco, and S. Battiato, “Retrieval and classification of food images,” Comput. Biol. Med., vol. 77, pp. 23–39, 2016, doi: 10.1016/j.compbiomed.2016.07.006.

[49] P. Panindre, P. K. Thummalapalli, T. Mandal, and S. Kumar, “Deep Learning Framework for Food Item Recognition and Nutrition Assessment,” in 6th International Conference on Mobile Computing and Sustainable Informatics, ICMCSI 2025 - Proceedings, 2025, pp. 1648–1653. doi: 10.1109/ICMCSI64620.2025.10883519.

[50] Y. Zhao, P. Zhu, Y. Jiang, and K. Xia, “Visual nutrition analysis: leveraging segmentation and regression for food nutrient estimation,” Front. Nutr., vol. 11, 2024, doi: 10.3389/fnut.2024.1469878.

[51] W. Shao et al., “Vision-based food nutrition estimation via RGB-D fusion network,” Food Chem., vol. 424, no. February, p. 136309, 2023, doi: 10.1016/j.foodchem.2023.136309.

Downloads

Published

2026-04-10

How to Cite

Purwati, N., Isnanto, R. R., & Irene Kartasurya, M. (2026). Multimodal Fusion of Image and Recipe Features for Nutrient Estimation of Complementary Foods. International Journal of Online and Biomedical Engineering (iJOE), 22(04), pp. 140–154. https://doi.org/10.3991/ijoe.v22i04.59905

Issue

Section

Papers