Design of a Multimodal Product Recommendation System Using Image and Text Features


Date Published : 9 May 2026

Contributors

Ssvr Kumar Addagarla

Author

Dr. Upendra Kumar

Institute of Engineering and Technology, Lucknow, India Adjunct research faculty, Lincoln University College, 47301, Petaling Jaya, Selangor Darul Ehsan, Malaysia
Author

Keywords

Multimodal recommendation; Product clustering; Vision Transformers; Explainable AI; E-commerce.

Proceeding

Track

Engineering and Sciences

License

Copyright (c) 2026 Sustainable Global Societies Initiative

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Abstract

The rapid growth of e-commerce platforms has increased the need for effective recommendation systems to help users discover relevant products from large online catalogs. Traditional recommender systems mainly rely on user interaction data or single-source information, which often limits their ability to capture detailed product characteristics. In the proposed framework, product images are processed using deep learning-based feature extraction models, while textual information such as product titles and descriptions is encoded using semantic embedding techniques. The extracted features are then combined to form a unified product representation, which is used to compute similarity among products and generate recommendations. The system architecture, feature extraction process, and recommendation strategy are discussed in detail. The proposed framework provides a simple foundation for developing multimodal recommendation systems in e-commerce environments. Future work will focus on implementing the system on real-world datasets and evaluating its performance using standard recommendation metrics.

References

No References

Downloads

How to Cite

Addagarla, S. K., & Upendra Kumar, U. K. (2026). Design of a Multimodal Product Recommendation System Using Image and Text Features. Sustainable Global Societies Initiative, 1(4). https://vectmag.com/sgsi/paper/view/335