Design of a Multimodal Product Recommendation System Using Image and Text Features
Contributors
Ssvr Kumar Addagarla
Dr. Upendra Kumar
Keywords
Proceeding
Track
Engineering and Sciences
License
Copyright (c) 2026 Sustainable Global Societies Initiative

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Abstract
The rapid growth of e-commerce platforms has increased the need for effective recommendation systems to help users discover relevant products from large online catalogs. Traditional recommender systems mainly rely on user interaction data or single-source information, which often limits their ability to capture detailed product characteristics. In the proposed framework, product images are processed using deep learning-based feature extraction models, while textual information such as product titles and descriptions is encoded using semantic embedding techniques. The extracted features are then combined to form a unified product representation, which is used to compute similarity among products and generate recommendations. The system architecture, feature extraction process, and recommendation strategy are discussed in detail. The proposed framework provides a simple foundation for developing multimodal recommendation systems in e-commerce environments. Future work will focus on implementing the system on real-world datasets and evaluating its performance using standard recommendation metrics.