Elsevier

Applied Soft Computing

Volume 109, September 2021, 107552
Applied Soft Computing

Contextual recommender system for E-commerce applications
适用于电子商务应用的情境推荐系统

https://doi.org/10.1016/j.asoc.2021.107552Get rights and content
Full text access

Highlights

  • Hybrid Collaborative filtering model is proposed for recommender system.
    为推荐系统提出了混合协同过滤模型。
  • It is aware of both context and semantic of user and item textual details.
    它知道用户和项目文本详细信息的上下文和语义。
  • User embeddings are prepared using word2vec(w2v).
    用户嵌入是使用 word2vec(w2v) 准备的。
  • Items embeddings are generated using Convolutional Neural Network(CNN).
    项目嵌入是使用卷积神经网络 (CNN) 生成的。
  • PMF is used as collaborative filtering techniques.
    PMF 用作协作过滤技术。
  • The model is primarily proposed for missing rating prediction.
    该模型主要用于缺失评级预测。
  • The model can be used as recommendation model.
    该模型可用作推荐模型。
  • The proposed model is tested on three real world dataset.
    所提出的模型在三个真实世界数据集上进行了测试。

Abstract  抽象

Today’s arena of global village organizations, social applications, and commercial websites provides huge information about products, individuals, and activities. This is leading to a plethora of content that requires effective handling to obtain the desired information. A recommendation system (RS) suggests relevant items to the user according to his/her desired preference. It processes various information related to users and items. However, RSs suffer from data sparsity. Generally, deep learning techniques are used in RSs for deep analysis of item contents to create precise recommendations. However, the effective handling of user reviews in parallel with item reviews is still an open research domain that can be further explored. In this paper, a hybrid model that handles both user and item metadata concurrently is proposed with the aim of solving the sparsity problem. To demonstrate the viability of the proposed methodology, a series of experiments was performed on three real-world datasets. The results show that the proposed model outperforms other state-of-the-art approaches to the best of our knowledge.
当今的地球村组织、社交应用程序和商业网站提供了有关产品、个人和活动的大量信息。这导致了大量内容,需要有效处理才能获得所需的信息。推荐系统 (RS) 根据用户所需的偏好向用户推荐相关项目。它处理与用户和项目相关的各种信息。但是,RS 存在数据稀疏的问题 。通常,RS 中使用深度学习技术对项目内容进行深入分析,以创建精确的推荐。然而,在项目评论的同时有效处理用户评论仍然是一个开放的研究领域,可以进一步探索。在本文中,提出了一种同时处理用户和项目元数据的混合模型,旨在解决稀疏性问题。为了证明所提出的方法的可行性,我们在三个真实世界的数据集上进行了一系列实验。结果表明,据我们所知,所提出的模型优于其他最先进的方法。

Keywords  关键字

Probabilistic matrix factorization
Hybrid collaborative filtering
Convolutional neural network
Textual embedding

概率矩阵分解
混合协同过滤
卷积神经网络
文本嵌入

1. Introduction  1. 引言

In the present era of socioeconomic race, recommendation systems (RSs) are the center segment of numerous e-business associations, movie sites, e-libraries, articles, news, and music and social forums. RSs are being used by firms to infer the likes and dislikes of users, fans, and potential customers. Commercial companies and social website organizations extensively use their own developed RSs. Such organizations generate considerable amounts of revenue based on user-aligned recommendations. Such valid recommendations play a vital role and increase their business revenue manifold. RSs have accordingly gained significance with the broad utilization of the Internet. A plethora of data for a single entity is readily available on the internet. A RS focuses on user needs and provides what the user wishes to have or expects to be recommended. RSs are content retrieval processes that suggest necessary relevant information from a large amount of data. Such a huge body of data is gathered by business organizations with the passage of time [1]. RSs are beneficial for both users and organizations in terms of valid recommendations and generating ample revenue. RSs improve customers’ decision-making process for online products and allow them to find relevant information quickly [2]. In this regard, different recommendation methodologies are proposed in the literature, such as collaborative filtering (CF), content-based filtering (CBF) and hybrid filtering(HF) [1], [3], [4]. Netflix, Amazon, Facebook, Google News, YouTube, Twitter, LinkedIn and many other organizations have deployed their RS to target their potential customers. They aim to show recommendations on their websites as per the customer’s preferences. RS performance is dependent upon the historical interaction between users and items. Such user-to-item interaction can be mapped in the form of an explicit rating matrix. The same interaction can also be implicitly formed in terms of user reviews and item descriptions. However, the data are sparse because most users do not give their reviews or ratings at all times about an item they have purchased. Therefore, such mappings (both implicit and explicit) remain sparse. To handle the sparsity of data through concurrent handling of both users and items, a hybrid model is proposed in this paper. The hybrid model combines the semantics and contextual ingredients of users and items’ textual description to the explicit rating data (explicit). Such hybrid collaborative filtering model aims to solve the problem of data sparsity where the number of items relevant to users is very few.
在当今社会经济竞赛的时代,推荐系统 (RS) 是众多电子商务协会、电影网站、电子图书馆、文章、新闻以及音乐和社交论坛的中心部分。公司正在使用 RS 来推断用户、粉丝和潜在客户的好恶。商业公司和社交网站组织广泛使用他们自己开发的 RS。此类组织根据用户一致的建议产生可观的收入。这些有效的建议起着至关重要的作用,并增加了他们的业务收入。因此,随着 Internet 的广泛使用,RS 变得越来越重要。互联网上很容易获得单个实体的大量数据。RS 专注于用户需求,并提供用户希望拥有或期望被推荐的内容。RS 是从大量数据中建议必要相关信息的内容检索过程。随着时间的推移,商业组织 收集了如此庞大的数据 [1]。RS 在有效推荐和产生丰厚收入方面对用户和组织都有好处。RS 改善了客户对在线产品的决策过程,并允许他们快速 找到相关信息 [2]。 在这方面,文献中提出了不同的推荐方法,例如协同过滤 (CF)、基于内容的过滤 (CBF) 和混合过滤 (HF) [1][3][4]。 Netflix、Amazon、Facebook、Google News、YouTube、Twitter、LinkedIn 和许多其他组织已经部署了他们的 RS 来定位他们的潜在客户。他们的目标是根据客户的喜好在他们的网站上显示推荐。RS 性能取决于用户与项目之间的历史交互。这种用户与项目的交互可以以显式评分矩阵的形式进行映射。相同的交互也可以隐式地形成在用户评论和项目描述方面。但是,数据很少,因为大多数用户不会一直对他们购买的商品进行评论或评分。因此,此类映射 (隐式和显式) 保持稀疏。为了通过并发处理用户和项目来处理数据的稀疏性,本文提出了一种混合模型。混合模型将用户和项目文本描述的语义和上下文成分与显式评级数据(显式)相结合。这种混合协同过滤模型旨在解决与用户相关的项目数量非常少的数据稀疏问题。
The proposed model is termed the contextual hybrid model for RS, which effectively handles rating matrix data sparsity. The proposed model integrates the generated embedding of users and items into the corresponding latent factor of probabilistic matrix factorization (PMF). These user and item embeddings are concurrently developed by w2v and a CNN. Here the w2v captures the semantics of the text and the CNN extracts the contextual details of the textual description. Moreover, the feature locality information is captured by dividing each feature vector into sections and then the max-pool operation is applied to each section in the max-over pooling layer [5]. PMF is used as a collaborative filtering technique that outperforms in sparse, imbalanced, and large datasets when incubated with auxiliary information. It can predict approximate nearer ratings against users. The closer the predicted ratings are, the more accurate item recommendations are that can be provided to the user. Experimental evidence illustrates that contemporaneous understanding of the semantics and context of constituent entities improves the performance of RSs. The major contributions of our paper are summarized as follows:
所提出的模型被称为 RS 的上下文混合模型,它有效地处理了评级矩阵数据的稀疏性。所提出的模型将生成的用户和项目的嵌入集成到概率矩阵分解 (PMF) 的相应潜在因子中。这些用户和项目嵌入由 w2v 和 CNN 同时开发。在这里,w2v 捕获文本的语义,而 CNN 提取文本描述的上下文细节。此外,通过将每个特征向量划分为多个部分来捕获特征位置信息,然后将 max-pool 作应用于 max-over pooling 层 [5] 中的每个部分。PMF 用作一种协作过滤技术,当与辅助信息一起孵化时,该技术在稀疏、不平衡和大型数据集中表现出色。它可以预测针对用户的近似 Nearer 评级。预测的评级越接近,可以提供给用户的商品推荐就越准确。实验证据表明,同时对组成实体的语义和上下文的理解可以提高 RS 的性能。本文的主要贡献总结如下:
  • A hybrid contextual model is proposed, which improves the accuracy of rating prediction by merging semantic and contextual details of both the user and item to collaborative filtering.
    该文提出一种混合上下文模型,通过将用户和项目的语义和上下文细节合并到协同过滤中,提高了评分预测的准确性。
  • User/item latent models are converged for accurate recommendations based on their corresponding semantic and contextual details. In this regard, a convolutional neural network generates semantically enhanced contextual embedding of users and items.
    用户/项目潜在模型会融合在一起,以便根据其相应的语义和上下文详细信息提供准确的推荐。在这方面,卷积神经网络生成用户和项目的语义增强上下文嵌入。
  • The feature locality information is captured by dividing each feature vector into sections and then the max-pool operation is applied to each section in the max-over pooling layer [5].
  • Experimentation shows effective handling of data sparsity and enhanced rating prediction and item recommendation accuracy when compared with other state-of-art models to the best of our knowledge.
The rest of this paper is organized as follows. Section 2 provides a review of the literature related to background knowledge. Our proposed model is presented in Section 3. Experimental findings and comparisons are discussed in Section 4. Finally, we conclude in Section 5.

2. Background and related work

The RS is built upon various collaborative filtering (CF), content-based filtering(CBF), and hybrid filtering (HF) techniques. The workflow of these methodologies depends on how the provided data is processed. Each of these techniques is briefly explained to provide background knowledge. CF, the most widely used RS technique, makes recommendations on the basis of user/item historical data. It exploits the behavior of users and reviews on items to predict ratings and attempt accurate item recommendations. This technique is built on the basis of neighborhood search, where other groups of people (having the same interest, likes and dislikes) are looked into with respect to the current user. Subsequently, similar items are recommended. It assimilates implicit ratings built in the form of a sparse matrix wherein missing ratings are to be predicted based on historical ratings. This improves decisions made by clients based on neighboring similar users [6]. CF is further subdivided into memory & model-based filtering. The first is based on the past history of user or item, whereas later attempts to discover entities (neighbors) who have similar features. In model-based RS, learning methods are implemented to build trained models learned on user preferences or item trends, based on which missing ratings are predicted and recommendations are made [4], [7].
CBF recommends items even if no ratings to that item are present in its previous record. It is based on item contents and feedback given by clients through ratings or reviews. Because of knowing about users/items, cognitive filtering is another name for CB filtering. In light of this, client profiles are created on given data, which are further utilized for making recommendations and rating predictions. As more data are given by the client, more accurate recommendations are made [8]. In the CB filtering technique, machine learning-based algorithms develop user/item profiles and capture their trends/choices. Accordingly, items having similarities with user profiles are recommended to the user. To better grasp the item profile, consider news/books/products that have different features, individuals, concepts, contents and properties, etc. On the other hand, the user profile is composed of the user’s biodata information, likes/dislikes, age, etc.[9]. CBF recommends items as soon as valid details about items are available [2], [9], [10]. The CB methodology is similar to clustering items into two sections. One cluster is relevant to client preferences, and the other is irrelevant. The CB technique recommends related/relevant items to users according to their preferences. In this regard, user profiles are built on the basis of user document content.
Hybrid filtering is a joint venture of both collaborative and content-based techniques. To synergize the strengths of both CB and CF, hybrid filtering suppresses the limitations of both while incorporating positive aspects. It provides more precise suggestions and performs comparatively better than both CBF and CF. Different hybrid filtering methods have been presented by researchers in the academic literature; however, this is still an open area of research, particularly for data sparsity and cold-start problems [2], [11], [12].
A vast amount of research on RS using side information has been published in the literature. Earlier RSs utilize either CBF or CF techniques for recommendation tasks. In CF, products are recommended by a model based on client aptitude, i.e., nearest neighboring [13], matrix factorization (MF) [14], nonnegative MF (NMF) and singular value decomposition (SVD) [15]. Various methodologies have been adopted to improve PMF performance by processing auxiliary information followed by the proposed Bayesian version of PMF [16], [17], [18]. Another filtering method named trust awareness is presented in [8]. It uses a collaborative technique based on the concept that CF can be influenced by user reputation (computed by propagating user trust). Eigentaste calculation, uses worldwide inquiries to extricate client appraisals and applies principal component analysis PCA for recommendations is proposed [19].
CBF, another strategy utilized for RS, suggests products based on contents (side/auxiliary data) of either client or items. In [20], RS techniques that utilize the local information of a product and generate distributed representation while neglecting meta data are presented. The technique suggested in [21] is capable of identifying small articles for recommendations related to home improvement. The recommendation is made on the basis of user profile comparison with available document contents. Hybrid approaches have been created by researchers to incubate RS performance by combining the strengths of CF and CBF [2][22] proposed the restricted Boltzmann machine, which finds item similarities and maps them in CF. Cross-domain MF along with the coordinate transfer model explored by the author in [23]. The same is further studied in depth by [24] in a use case where no shared client exists in cross domains. Subsequently, a generative model for finding comparative grouping between various domains was proposed. In [25], the multiview deep neural network (MvDNN) is presented; it maps users and items in a shared space. In this approach user features/profiles are formulated from their past history, and films are suggested on the basis of nearest neighbors with identical profiles who watch the film that is being recommended. In [12], the hidden factor and topic (HFT) model, which is based on topic modeling techniques, is proposed. The model combines the latent review topics to interpret rating dimensions of either users or items. The results show comparatively better performance to when reviews or ratings are used separately. CADE (collaborative denoising autoencoder) presented in [26] predicts top-N recommended items. It utilizes only the rating matrix and neglects the auxiliary information. In [11], a unified model integrating CB and CF to solve the cold start problem is proposed. It applies topic modeling for user reviews to improve recommendation performance. Latent Dirichlet Allocation (LDA) a topic modeling technique that captures topic details of given data, was proposed in [27]. It addresses the word2vec (w2v) limitation, representing the data locally, i.e., semantics only [28]. In [29], the author proposed the topic word embeddings (TWE) model, which relates latent topic models to each word in text. TWE is learned based on both semantic and corresponding topics. Therefore, document vector representations influenced by different topics are obtained. In [30] Collaborative Deep Learning (CDL) where joint functioning of content deep representation learning and collaborative functioning can improve RS performance. CTR [31] is a topic regression model wherein CF and topic modeling are done simultaneously.
Recently, deep learning (DL) techniques have gained importance in the domains of AI, machine learning, natural language processing (NLP) and speech recognition. DL is being widely utilized for implementing RS using both CF and CBF. In [32], DL techniques are used as a marginalized denoising autoencoder (mDA), which applies mDA to learn item latent representation and neglects randomly corrupted features. Thus, it also reduces computational complexity and training costs. In [33], Hybrid-CF and content-based music recommendations are researched; they learn music content latent factors by using matrix factorization and employ DL techniques to regenerate them for better song suggestions. The DeepConn model in [34] uses both user and item embeddings as latent factors and models their interactions through factorization machines. It is based on a dual neural network to map latent factors into shared space for making recommendations. [35] proposed a hybrid collaborative filtering model in which non-negative matrix factorization is used as a collaborative filtering technique and its latent factors are initialized by the user’s embedding. [36] a presented deep semantic based hybrid model based on the integration of lada2vec with PMF. [37] proposed dual-regularized MF with deep neural networks to cope with the sparsity issue of RS. It adopted a multilayered model by stacking a CNN and gated RNN to formulate independently distributed representations of user and item contents. The model essentially integrates the CNN with the RNN models to better learn the word dependency and sequential information of words. A MF algorithm is then applied at the prediction layer to finally compute the rating prediction. [38] presents a CF technique named deep MF that integrates DL with MF. It undertakes successive reinforcement of MF with a layered framework that utilizes the gathered knowledge on one layer as input to the following layers. Some researchers have adopted the RNN model and demonstrated improved performance for recommendation systems. The majority of the RNN based models are specifically applied to formulate session-based recommendations. In [39], [40] the authors proposed a feature rich session-based method by exploiting a parallel RNN architecture for enhancing the performance of the system. The proposed approach typically encompasses several RNNs, one for each representation of the item. The hidden states of the RNN model are integrated for generating the scores for all items. In this method, specific users’ sessions can be considered as a sequence of clicks. [41] proposed a context-aware RS that extended the CF technique and learned nonlinear interaction between latent features of users, items, and sequential latent contexts representation. In [42] the authors proposed a CNN based recommender system that obtains the user profile and predicts his/her ratings using an attention-based CNN. Moreover, it recommends top-n online courses to the interested students according to their profile. In [43], an e-business recommendation system is proposed, in which user to item rating matrix and user reviews are considered. The proposed technique combines reviews mining and deep learning to the attention mechanism.
Our proposed model formulates a vector representation of item and user textual contents through w2v & CNN, thus capturing both semantics and the context of words. It combines the generated representation (having both semantic and contextual details) with a collaborative technique aimed at improving rating prediction accuracy and top-n item suggestions. A comparative analysis of the proposed model with the state-of-the-art models (to the best of our knowledge) is presented in Table 1 based on each model’s strengths and weaknesses.

Table 1. Comparative analysis of models.

ModelInputsTaskMetricFeature vectorSemanticsContext
PMF [14]Ratings dataRatings predictionRMSE.NoNoNo
CDL [30]Ratings data, User documents.Ratings predictionmAP, RecallLDANoNo
CDL [31]Ratings data, User documents.Ratings predictionRecall.LDANoNo
ConvMF [44]Ratings data, User documents.Ratings predictionRMSETF–IDFNoUser only
CERMF [45]Ratings data, User & item documents.Ratings predictionRMSE, MAETF–IDFNoUser & item
DRMF [37]Ratings data, User & item documentsRatings prediction & Items RecRMSE, MAE, Precision, RecallGloveNoUser & item
CapsMF [46]Ratings data, User & item documentsRating prediction & Items RecRMSE, MAE, Precision, RecallGloveNoUser & item
CMF-HRSRatings data, User & item documentsRatings prediction & Items RecRMSE, MAE, Precision, Recall, F1 scorew2vUser & itemUser & item

3. Contextual matrix factorization-hybrid recommendation system (CMF-HRS)

In this section, the proposed framework “Contextual Matrix Factorization for Hybrid Recommendation System” (CMF-HRS) is explained. In CMF-HRS, user’s and item’s textual documents containing reviews are taken as input to the framework. These textual documents are processed by w2v and the CNN to extract semantics and contextual details, respectively. In this regard, user reviews for an item indicate the user’s profile, and the item’s textual description shows the item’s profile. Both of these textual descriptions are known as auxiliary information, as the same will support the framework in making user and item’ profile in later stages. Here, w2v is applied to generate dense vector representations of the textual description Fig. 1.
After creating user and item embeddings, both of these embeddings are concurrently processed by the CNN (used as NLP technique). At the convolution layer, the CNN extracts the contextual ingredient of the textual description through convolution with a kernel [44]. Contextual details depict each word utilization within a sentence either as noun, verb, adjective, etc. Moreover, the word effect on other words in that sentence (syntax analysis) is also represented by the context in which that word is used. Therefore, user and item documents are converted into embeddings that encompass both their semantics and contextual ingredients.
  1. Download: Download high-res image (402KB)
  2. Download: Download full-size image

Fig. 1. Creation of user and item embeddings using w2v.

Subsequently, the enriched embeddings are input to the collaborative filtering technique, i.e., probabilistic matrix factorization (PMF) Fig. 2. Here enriched embeddings indicate such embeddings which encapsulate both the semantics and context details of the user and item textual description. User and item constituents extracted by the CNN are merged into collaborative filtering techniques Different kinds of MF are used as collaborative filtering techniques in the RS. Matrix factorization divides a given matrix into its lower dimension sub-matrices. In the RS, it decomposes the rating matrix into three sub-matrices that are categorized as the user, item, and data concept latent matrices. Collaborative filtering is the practical implementation of matrix factorization in which user-to-item historical interaction is identified. The user to item rating matrix is input to the matrix factorization and based on the identified relationship; it is contemplated how the user would rate the items in future. Accordingly, better item recommendations are made to the user. PMF is used as a collaborative filtering technique in the proposed model. Both latent factors of user and item are initialized with embeddings generated by w2v and the CNN. To prove the efficacy of the proposed model, a series of experiments were carried out on three real-world datasets of Amazon Instant Videos (AIV), Apps for Android (AA) and Yelp where every user’s profile is described with reviews and an item’s profile with a corresponding description. This section elaborates on the generation of embeddings and a mathematical illustration of the proposed model. Subsequently, probabilistic matrix factorization, which is used to combine both item and user embeddings to predict missing ratings, is explained. Additionally, n-items are recommended based on predicted ratings.
  1. Download: Download high-res image (468KB)
  2. Download: Download full-size image

Fig. 2. Contextual matrix factorization for hybrid recommendation system.

3.1. Mathematical illustration of the proposed model

Let P = [1,2,3....M] be the set of M users and set Q = [1,2,3, …N] is the list of N items. Suppose users ratings to the items are represented in a 2D rating matrix Rmn=[1,2,3,4,5]M×N. It is known that the user may or may not rate all the items purchased by him/her; thus, there will be some rating values missing in the user to item rating matrix R. The role of a recommender system is to predict such missing ratings of users and based on predicted ratings, items are recommended to the users. The recommender system predicts these missing values based on the known rating values and the user and item auxiliary information that is available.

3.1.1. Embeddings generation

The proposed RS utilizes the w2v as shown in Fig. 1, and a CNN to generate user and item embeddings. CNNs are dominant in the fields of image processing and computer vision as prominent solution entities for classification, labeling, and anomaly detection in image frames, image segmentation, and edge detection. However, their utilization in natural language processing is also on the rise and is being explored by academic researchers. A CNN as an NLP technique captures the contextual meaning of the documents’ vector representation, which is input to its embedding layer. The CNN has four consecutive layers, Embedding, Convolution, Max-pool, and Projection layers. User documents (UDs) and item documents (IDs) are processed by the w2v model for dense vector representation. Subsequently, the embedding layer of the CNN is initialized with these word vectors concatenated in the form of a meaningful matrix. The matrix encapsulates the semantics of the words present in the document. The details of captured semantics are proportional to the dimension/size of word vectors, which is set as ‘d’. ‘d’ is kept similar to the predefined dimensional parameter for both w2v and the embedding layer of the CNN so that both can be linked. We know that user and item auxiliary documents are composed of word sequences and denoted by vector notation of the ‘d’ dimension generated by w2v. Let ID contain n1 words for item 1, n2 words for item 2, and n words for the Nth item. Similarly, UD denotes the user auxiliary documents with m1 words for user 1, m2 words for user 2, and m words for the Mth user. Both item document ID and user document UD can be represented as follows: ID=ID1,ID2,ID3,,IDNID=[w1,w2,w3,,wn1][w1,w2,w3,,wn2][w1,w2,w3,,wNth]
Similarly: UD=UD1,UD2,UD3,,UDMUD=[w1,w2,w3,,wm1][w1,w2,w3,,wm2][w1,w2,w3,,wMth] where wNth and wMth represent the last words in item and user documents, respectively. Moreover, shows the concatenation of each item and user respective description. Subsequently, prepossessed corpora IDc and UDc are generated by text cleaning, removing stop words, deleting unwanted symbols, and applying tokenization and various other sub-tasks for improved results in the following processing. Here, let us assume that total li and lu words were present in the processed corpus of item and user, respectively. Therefore, IDc and UDc can be denoted as follows: IDcCleaned CorpusID1,ID2,ID3,,IDN UDcCleaned CorpusUD1,UD2,UD3,,UDMwhere
  • IDc: Item cleaned Corpus
  • UDc: User cleaned Corpus
  • N : Total number of Items
  • M : Total number of Users
To extract semantics and achieve numerical representation, the w2v model was applied to the cleaned corpora. Accordingly, dense vector representation is obtained by the probabilistic model defined as Eq. (1), which is optimized with the objective function in Eq. (2). Such dense vectors are semantically well learned and densely populated. The cost function reveals the average negative log-likelihood, thus minimizing it, leading to maximized predictive accuracy. It has been applied to each word position i=1 to i=wn or i=wm in the text of respective user and item documents. Vector representation is measured by the model as per the length normalized average of the word vectors Eq. (3) (for items) and Eq. (4) (for users). (1)Lθ=i=1wnx<j<x,j0P(wi+j|wi;θ) (2)J(θ)=1wnlog(Lθ)=1wni=1wnx<j<x,j0logP(wi+j|wi;θ) (3)Wvi=1wni=1wn1wiwi (4)Wvu=1wmi=1wm1wiwiwhere Wvi and Wvu represent an item and user documents sentence representation in the form of dense vectors. Here, w1,w2,w3,,wn is the vector embedding of words in each item-related document, and wi is the L2 Norm of the wi vector. The dimension for each sentence vector representation was set to ‘d’ for both the user and item. Accordingly, due to concatenated sentences in the prepossessed corpora IDc and UDc, matrix representation was developed by the w2v shown as follows for both the user and item: Item Embedding(IE)= Item Embedding(IE)=ID1=w1,w2,w3,,wn1ID2=w1,w2,w3,,wn2ID3=w1,w2,w3,,wn3IDN=w1,w2,w3,,wNth=v10,v11,v13,,v1dv20,v21,v23,,v2dvi0,vi1,vi3,,vidvl0,vl1,vl3,,vlidli×dUser Embedding(UE)=UD1=w1,w2,w3,,wm1UD2=w1,w2,w3,,wm2UD3=w1,w2,w3,,wm3UDM=w1,w2,w3,,wMth=v10,v11,v13,,v1dv20,v21,v23,,v2dvu0,vu1,vu3,,vudvl0,vl1,vl3,,vludlu×d vlid denotes the lth dimension of the vector in the item embedding matrix, and lud represents the lth dimension of the similar vector in the user embedding matrix. In user and item embeddings, word vectors are sequentially connected, thus holding semantics only. Subsequently, the embedding layer of the CNN is initialized by these generated embedding matrices. The CNN undertakes the lexical and syntax analysis of item and user documents to capture their fine grain contextual details. Contextual details mean how the order of words matters in a sentence. For example, in the sentence “the movie Gladiator is an English language movie and has English actors”, the word English is used in two different contexts. In the first case, it corresponds to the movie, and in the second, it belongs to the actors in the movie. The CNN captures such contextual details of both user and item documents. Let vi Item Embedding be the ith word vector of the d-dimension and vu User Embedding; thus, an item sentence comprising x words and a user sentence comprising y words can be represented as follows: v1:xv1v2vxv1:yv1v2vy
where denotes concatenation. Generally, vx:x+k shows concatenation of word vectors vx,vx+1,vx+2,,vx+k.
In the Convolutional Layer, contextual information about item textual documents and user reviews is acquired through the convolution kernel over the concatenated words. The size/length of the convolutional kernel defines the number of words considered in one convolution operation. Various window sizes were used to obtain content from item and user documents. The convolution kernel KRd×W was passed through W words that were convolved together, and their dimensions remained fixed. Let i and u created from the convolution operation of item’s and user’s word vectors vi:i+W1 and uu:u+W1 be represented by the following: i=F(KWIE(:,vi:i+W1)+biasi)u=F(KWUE(:,vu:u+W1)+biasu) where shows the convolution operation. KW is the convolution kernel of size W. biasiR and biasuR are real numbers that act as bias terms, and F is a mathematical nonlinear function that can be tanh, sigmoid or ReLU. The convolutional kernel is applied to different possible sizes of word vectors in a sentence, and a contextual features map is generated as follows: θi=[1,2,3,,liW+1]θu=[1,2,3,,luW+1]
Max-over pooling layer operations max(θi) and max(θu) are usually applied to capture the most important contextual feature, i.e., the highest value of θi and θu amongst the feature map. Moreover, variable sentence length is also addressed by padding to construct a fixed length feature vector of both item and user documents. However, feature locality information can be lost by applying a direct max operation. Moreover, frequency information of a feature appearing frequently may also be neglected by applying max directly. Therefore, to capture this information, we divided each feature vector into sections and applied the max operation to each section to obtain the maximum contextual details [5]. To calculate the size of each section of the feature map, the length of each segment is represented by Eq. (5), where it is supposed that lu=li=l for ease of understanding and the entire feature map is divided into ζ sections. (5)Sec=lW+1ζ+1 (6)Θi=[max(θ1θ1Sec),max(θ2θ2Sec),max(θ3θ3Sec),,max(θliW+1θliSec)] (7)Θu=[max(θ1θ1Sec),max(θ2θ1Sec),max(θ3θ1Sec),,max(θluW+1θluSec)]In the OutputLayer, features captured in the max-pooling layer should be converted into a form that is required to perform subsequent tasks. Here, the next step is to initialize the latent factors of the collaborative filtering PMF with the details from the output layer. These high-level features Θi and Θu are converted to d-dimensional vector space by using nonlinear projection: (8)V=tanh(M2{tanh(M1(Θi)+b1)}+b2) (9)U=tanh(M2{tanh(M1(Θu)+b1)}+b2)where M1 and M2 are projecting matrices and b1,b2 are bias vectors for M1,M2. Consequently, after going through the above methodology, the CNN and w2v both jointly become functions (fni for items and fnu for users), which take items and users’ raw documents as input and generate latent models V and U while manipulating internal weights. The obtained V and U are linked up to PMF items and users latent models, respectively. (10)Vn=fni(W,IDn) (11)Um=fnu(W,UDm)

3.1.2. Probabilistic matrix factorization

Probabilistic matrix factorization (PMF), a collaborative filtering technique, is used by mathematicians to present a larger matrix into the shared dimensional latent space of three smaller matrices with reduced dimensions. Similarly, PMF represents a larger rating matrix into three smaller matrices. The constituent matrices are interpretive and humanely intuitive to understand the information of the parent rating matrix. The rating matrix Rmn[1,2,3,4,5]M×N is decomposed by PMF into three matrices, Pm=PM×D, Qn=QD×N and δD=δD×D. where
  • Pm=PM×D: User Latent Factor
  • Qn=QD×N: Item Latent Factor
  • δd=δD×D: Concept latent factor
The proposed RS technique integrates a modified CNN into PMF to nourish the user latent factor (P) and item latent factor (Q) with both semantics and contextual details of items and users. The proposed model predicts a ratings matrix Rˆmn for user mPM×D who is expected to rate an item nQD×N. The predictions are based on known ratings in the given ratings matrix Rmn in congestion with prepossessed auxiliary information IDc and UDc of items and users. Thus, (12)Rˆmn=(PmT.Qn.δ)where δ reveals the presence of concepts in Rmn. To obtain the predicted ratings matrix Rˆmn, it is mandatory to determine the user matrix Pm and item matrix Qn such that their product returns the predicted rating matrix. Both latent factors are calculated through an iterative process of minimization over a probabilistic cost function as follows: (13)Rˆmn=mMnNO(RmnPmTQn)2+δmmMPm2+δnnNQn2where O is a binary function that is set to 1 if the user has rated the item and kept zero vice-versa. In Eq. (13), δm is the user regularization term, and δn is the item regularization term. The performance of the model over the aforesaid equation is monitored through the root mean square error (RMSE) and mean absolute error. Probabilistic linearity over a Gaussian noise distribution is adopted for model convergence, wherein the conditional distribution over the observed ratings can be represented as follows: (14)p(Rmn|P,Q,δ2)=mMnNζ(Rmn|PmTQn,δ2)Omn
In Eq. (14), ζ(Rmn|PmTQn,δ2) is a Gaussian distribution with mean PmTQn and variance of δ2. As items and users latent factors are generated by the CNN from their corresponding textual documents, three variables of CNN were considered: (i) CNN hidden layer weights, (ii) nth item document IDn&mth user document representation UDm; and (iii) αn and αm variables as Gaussian noise. Consideration of these variables leads to more accurate user and item probabilistic latent models as follows: Vn=fni(W,IDn)+αns.t,αnG(0,δv2O)Um=fnu(W,UDm)+αms.t,αmG(0,δu2O) (15)p(P|W,UDm,δu2)=mMζ(Pm|fnu(W,UDm),δu2O) (16)p(Q|W,IDn,δv2)=nNζ(Qn|fni(W,IDn),δv2O)Trainable variables, including the internal layer weights of the CNN, can be combined to approximate the proposed model as follows: (17)p(P,Q,W|Rmn,IDn,UDm,δ2,δv2,δu2)p(Rmn|P,Q,δ2)p×(P|W,UDm,δu2)p(Q|W,IDn,δv2)

3.2. Optimization

To converge the proposed model and to maximally optimize both latent factors, a posterior estimation is considered prudent as follows: (18)maxP,Qp(P,Q,W|Rmn,IDn,UDm,δ2,δv2,δu2)=maxP,Q[p(Rmn|P,Q,δ2)p(P|W,UDm,δu2)p×(Q|W,IDn,δv2)]
Applying a negative logarithmic function to Eq. (18), gives: (19)ln(P,Q)=n=1Nm=1MOmn2(RmnPmTQn)2+αm2m=1MPnfnu(W,UDm)2+αn2n=1NQmfni(W,IDn)2where αm=δ2δu2, αn=δ2δv2 and .2 is the norm Frobenius. Repetitive optimization was adopted for users & items latent models (P & Q) while fixing the remaining variables. Taking the derivative of Eq. (19) with respect to Pm and Qn in a closed form of finding local minima renders following: (20)Pm=QRm+αmfnu(W,UDm)QOmQT+αmOd (21)Qn=PRn+αnfni(W,IDn)POnPT+αnOdwhere Om and On are diagonal matrices with m=1,,M and n=1,,N as elements. Moreover, Rm and Rn are vectors with (Rm)m=1M for user m and (Rn)n=1N for item n. The overall abstraction of mathematical flow of the model is shown in Fig. 3. Moreover, implementation of the proposed model is briefly elaborated as follows:
  1. Download: Download high-res image (227KB)
  2. Download: Download full-size image

4. Experimentation and results

This section elaborates on the performance evaluation of the proposed model based on real-world datasets. These datasets are primarily used in the field of RSs. It explains dataset preparation, i.e., preprocessing, followed by adjusting experimental parameters and results presented in tabular/graphical forms. Moreover, a comparative analysis of the proposed model with other state-of-the-art RS models is also provided in this section.

4.1. Dataset preprocessing

Three real-world datasets obtained from two different platforms were used to demonstrate the proposed model performance. AIV and AA downloaded from1 and the Yelp dataset obtained from2 were used for experimentation. The Amazon datasets fall in 22 subcategories of user reviews and product meta-data. The Amazon datasets are mostly used by researchers for RS model experimentation. The Yelp dataset is used for both the sentiment analysis and RS. We selected two different datasets named AIV and AA from the Amazon data bank to check the performance of our proposed model. In these datasets, item opinions were given by numerous users of different profiles. A threshold is set to select reviews of a user to build his profile. The dataset is selected based on sparsity, which was supported with auxiliary information available in the form of item plots and user reviews. The Table 2 shows the statistics of the dataset.
The datasets were divided into training, validation and testing at 80%, 10%, and 10%, respectively. User reviews and item textual descriptions were preprocessed by removing stop words followed by Part of Speech (PoS) tokenization. The TF-IDF threshold was set to 0.5 to exclude less frequent words from documents, i.e., to remove those words that appeared less frequent in documents and possessed no meaningful clues in the text. The vocabulary size is fixed to 8000, and introduced two thresholds tA to limit the associated review size and tB to control the size of the corresponding document. For users and items, a suitable combination of tA and tB may be different; however, it is kept constant (tA=0.8 and tB=0.5) for simplicity. Items with no ratings were removed from the datasets to obtain precise and accurate results. Subsequently, the preprocessed information was passed to the w2v word embedding model to produce a dense vector representation of user reviews and item plots. In this regard, the min-count is set to zero so that each word semantic in the sentence is captured and its influence is reflected by the vectors. The dimension of the vectors was set to 200 to be integrated with the embedding layer of the CNN. Subsequently, the CNN embedding layer was initialized by the vector notation of the text generated through w2v. The embedding layer dimension was also set to 200, the same as that of the w2v dimensions. Moreover, different window sizes [3,4,5] were used by the CNN model with dropout ratio to avoid over fitting. Moreover, various windows led to capturing the context of 3, 4 and 5 consecutive words. After obtaining user and item contextual information through CNN, PMF latent factors were initialized by them, respectively.

Table 2. Datasets statistical data.

DatasetsUsersItemsRatingsRangeDensity
AIV5130168537 126[1 – 5]0.429%
AA87 27113 209752 937[1 – 5]0.065%
Yelp12 14627 774408 410[1 – 5]0.121%

4.2. Evaluation matrices

To evaluate the performance of any devised model, different types of evaluation techniques are used by researchers [47]. Evaluation matrices can be divided into two categories. 1. Prediction Matrices 2. Classification Matrices. The proposed experimental model has been evaluated on the basis of RMSE (root mean square error) and MAE (mean absolute error), which are used as prediction matrices. Both RMSE and MAE are widely used in the domain of error measurement between actual and predicted values. The better performance of the proposed model is depicted by the repeated convergence of both evaluation matrices. Implementation of the same is based on a cost function for rating predictions: (22)RMSE=m,nM,N(RmnRˆmn)2MN (23)MAE=m,nM,N(RmnRˆmn)MNwhere Rmn is the actual rating, Rˆmn is the predicted rating, MN represents the total number of ratings, m represents the instant user and n represents the instant item. It is apprised that decreasing values of both RMSE and MAE with each epoch refers to the improving model performance. The minimum RMSE value for any model depicts its better performance in near real times.
In addition to rating prediction, the proposed model also recommends top-n items. Therefore, in addition to RMSE & MAE, recall and precision were adopted for the proposed model evaluation with respect to item recommendations. To elaborate precision and recall, we need to consider a list of top-n recommended items to a user. First denotes the ratio of correctly recommended items present in the top n recommended items and later represents the proportionate of correctly recommended items returned in top n recommended items. Moreover, in order to evaluate the overall performance of the proposed model, a single value evaluation metric i.e. F1-score is also calculated. F1-score can be taken as the harmonic mean of both precision and recall and is considered as a single values evaluation metric: (24)Precision=[RecItems][topNitems]#oftopNitems (25)Recall=[RecItems][topNitems]#ofRecItems (26)F1score=2×11Precision+1Recall
where
[Rec Items] denotes recommended items
[top N items] represents the top N items selected in the list of recommended items
The overall performance of the proposed model has been compared with the following models to the best of our knowledge:
  • PMF [14] is a basic collaborative filtering technique in the field of RS that applies random initialization of both user and item latent factors. It only uses user-to-item explicit rating matrix as input to the collaborative filtering technique.
  • ConvMF [44] exploited movie plots and formulated item contextual embeddings from CNN for further integration into PMF. It takes into account both explicit rating matrix and implicit movie plots (item’s content). This approach improved the performance of RS; however, the user’s profile was not made part of the model.
  • CERMF [45] deployed two convolutional neural networks with embedding layers initialized randomly for both users and items. The probabilistic collaborative filtering technique was used by the model for rating prediction.
  • DRMF [37] used a dual regularized deep neural network and simultaneously generated both user and item latent factors. DNN was employed for a deep understanding of auxiliary details and extracted meaningful outcomes.
  • CapsMF capsule matrix factorization [46] recently proposed the use of a capsule network for document representation in which bidirectional RNNs are used by capsMF for robust representation of users and items textual data. CapsMF extracts sequential patterns of words present in sentences.
  • CMF-HRS learned semantics and context details of users and items through integrated function of w2v and convolutional neural network. Subsequently, the learned embeddings are fed to latent factors of the collaborative filtering technique for rating predictions. Moreover, the ratings matrix binary form is used by the model for top-n item recommendation.

4.3. Results and analysis

Our experimental environment setup is based on a platform with an AMD Ryzen 7 3700x8-core processor and 16 GB RAM with a GeForce RTX 2080/PCIe/SSE2 NVIDIA GPU. All implementations were performed in 64-bit Ubuntu OS with Python 3.6 and the Keras library. The proposed model was set to a series of experiments performed on real-world datasets (Amazon and Yelp) to study the impact of different parameters.

4.4. Effect of CNN embedding layer dimension d

‘d’ denotes the dimension of word vectors generated by the w2v model. It also represents the embedding size of the convolution layer, as both are identical for integration. The variation in ‘d’ has been analyzed in the task of rating prediction by varying it from 100 to 300. This showed the effect of semantics and contextual contents of users and items on predicted ratings. Table 3, Table 4 depict the effect of different values of ‘d’ on rating prediction. It can be observed that changes in the dimension of the CNN embedding layer impose a visible effect on rating prediction. When ‘d’ is set to 200, the model performed optimally, but deviating from ‘d’ = 200 to either side increases the RMSE and MAE values. This conforms to the statement that the best semantics are captured by the model at embedding dimensions of 200. Therefore, it can be concluded that the CNN embedding layer captured the contextual details of the semantics that were presented to it by the w2v model in the form of dense vector representation. It is emphasized that the dimensions of both the w2v model and CNN embedding must be kept similar so that vectors generated by w2v can be integrated into the CNN embedding layer.

Table 3. RMSE of CMF-HRS for ‘d’ = 100, ‘d’ = 200, and ‘d’ = 300.

ModelDatasetRoot Mean Square Error (RMSE)
Empty CellEmpty Cell@d = 100@d = 200@d = 300
AIV0.7120.6510.719
CMF-HRSAA1.1130.9431.092
Yelp0.9760.9570.967

Table 4. MAE of CMF-HRS for ‘d’ = 100, ‘d’ = 200, and ‘d’ = 300.

ModelDatasetMean Absolute Error (MAE)
Empty CellEmpty Cell@d = 100@d = 200@d = 300
AIV0.7090.6850.694
CMF-HRSAA0.9160.8630.893
Yelp0.7930.7250.768

4.5. Effect of user and item latent factors dimension D

Capital D represents the dimension of the user and item latent factors PM×D and QD×N. In the task of rating prediction, the effect of D was analyzed on predicted ratings. The size of user and item latent factors varied from 25 to 75 to check the sensitivity of the proposed model. Table 5, Table 6 show the RMSE and MAE of the proposed model for D=25, D=50 and D=75 on both Amazon datasets and Yelp. It can be concluded that the proposed model performs better when D=50 and RMSE and MAE converged to minimum values at D=50. At D=75, an increase in both evaluation metrics was observed, thus depicting the degraded performance of the model. This implies that the distribution of item and user concepts also affects the RS performance in their corresponding latent factors. Therefore, precisely captured user/item latent factors improve the proposed model performance.

Table 5. RMSE of CMF-HRS for D = 25, D = 50 and D = 100.

ModelDatasetRoot Mean Square Error (RMSE)
Empty CellEmpty Cell@D=25@D=50@D=75
AIV0.6960.6850.692
CMF-HRSAA1.1020.93681.115
Yelp1.0580.98701.019

Table 6. MAE of CMF-HRS for D = 25, D = 50 and D = 100.

ModelDatasetMean Absolute Error (MAE)
Empty CellEmpty Cell@D=25@D=50@D=75
AIV0.70510.68510.6913
CMF-HRSAA0.90450.86230.9168
Yelp0.76850.72510.7831

4.6. Item’s recommendation performance

In order to evaluate the model for top-n items recommendation to a user, the rating matrix Rmn is converted into binary formation. The user rating for an item Rmn ranges from 1 to 5, where 5 is considered the highest rating(highly liked items) whereas rating 1 is considered the lowest rating(disliked item). We considered ratings from 3 to 5 as ‘1’ for being a relevant item and ratings 1 to 2 as ‘0’ being an irrelevant item to the user. Thus, a cut-off threshold, Th is set in the rating matrix and the matrix is converted to binary form based on the defined Th [48]. Accordingly, items are divided into two classes of relevant and irrelevant based on the set threshold. An item rated lower than Th is considered irrelevant and one that is rated higher than Th was considered as relevant. The model optimum performance is noted at regularization terms (δm=100,δn=10), D = 50(dimension of user and item latent factors), d = 200 (embeddings dimension) were chosen. . Moreover, precision, recall and F1-score are computed to monitor the efficacy of the proposed model. The improved performance of the proposed model is represented by a gradual rise in the evaluation matrices. As the datasets are very sparse, therefore, very few items are relevant to the user. Thus differentiation on top-n item recommendation performance is hardly detected. In this case, it is more meaningful to examine the recall rate than the precision. Moreover, model performance for lower values of n cannot be monitored precisely. Therefore, n is varied from 50 to 300 for the proposed model. Table 7, Table 8, Table 9 depict the gradual increase in precision, recall rate, and f1 score. Therefore, it can be concluded that as the value of the top n items increases, the performance of the model improves. If the user rates an increasing number of items, the subsequent recommendations in the future will be more relevant and precise to his profile. Moreover, varying the cut-off threshold (Th) of rating will affect the model performance. During the experiments, the threshold is set to Th= 2 and 3 to observe the model performance. In addition to (Th), the data sparsity will also affect the recall rate of the model. Fig. 4 represents the model performance with n varying from 50 to 300 when the rating threshold was set to 2 and 3.
  1. Download: Download high-res image (253KB)
  2. Download: Download full-size image

Fig. 4. Performance of CMF-HRS (a): Dataset AIV, (b): Dataset AA, (c): Dataset Yelp..

Table 7. Precision values of CMF-HRS for top n-recommendation.

DatasetAIVAAYelp
PrecisionTh 3Th 2Th 3Th 2Th 3Th 2
@500.003080.003270.001630.002360.010230.01036
@1000.003370.003560.002010.003190.010320.01047
@1500.003770.003820.002930.003830.010420.01054
@2000.004130.004290.003640.004760.010560.01066
@2500.004410.004620.005020.005890.010650.01071
@3000.004930.005100.005930.006030.010690.01079

Table 8. Recall values of CMF-HRS for top n-recommendation.

DatasetAIVAAYelp
RecallTh 3Th 2Th 3Th 2Th 3Th 2
@500.3990.4270.3790.4150.2280.329
@1000.4210.4630.3950.4270.3090.331
@1500.4930.5390.4300.4860.3280.353
@2000.5210.6160.4610.5130.3650.384
@2500.6390.7080.4920.5330.3930.406
@3000.7270.8020.5390.5650.4100.423

Table 9. F1-score of CMF-HRS for top n-recommendation.

DatasetAIVAAYelp
F1scoreTh 3Th 2Th 3Th 2Th 3Th 2
@500.00610.00640.00320.00460.01950.0200
@1000.00660.00700.00390.00630.01990.0202
@1500.00740.00750.00580.00760.02010.0205
@2000.00820.00850.00720.00940.02030.0207
@2500.00870.00910.00990.01150.02060.0209
@3000.00960.00970.01130.01170.02080.0210

4.7. Comparative analysis

The proposed model performs two different tasks of ratings prediction and top-n items recommendation on the basis of user and item profiles. These profiles are formulated by the model on the basis of their auxiliary information which is available in the form of textual documents. The proposed model’s various parameters are tuned with a heuristic approach and its performance is checked on the tuned parameters. Accordingly, regularized terms are adjusted to (δm=100,δn=10) for optimum performance of the model. The embedding dimension is set to 200 so that maximum semantics and context details of auxiliary information are captured by the w2v and CNN responsible for embeddings generation. Moreover, size of users and items latent factor is fixed at 50 for optimum distribution of concepts present in user to item ratings matrix. The proposed model is evaluated by using RMSE and MAE evaluation matrices for the task of ratings prediction. Moreover, its top-n item recommendation performance is evaluated in terms of precision and recall. The evaluation is performed on tuned parameters, and the results are shown in Table 10, Table 11.

4.7.1. Model performance on rating prediction

It can be ascertained from Table 10, that the RMSE of PMF [14] is on the higher side because there are no auxiliary details incorporated into users and item latent factors. In this case, both latent factors were initialized randomly while using only explicit rating data (rating matrix only). It can be observed that ConvMF [44] performs better than PMF in terms of RMSE and MAE. This shows that combining an item’s contextual details with ratings data improves the RS performance. Such contextual details are learned by the CNN from the item’s textual description. In addition to the item, CERMF [45] also learned user textual description in parallel with the item’s description by employing a dual convolutional neural network. This led CERMF [45] to perform comparatively better than ConvMF [44]. It can be argued that deep understanding of both users and items auxiliary information leads to improved performance of the RS. The same is complied by DRMF [37], which performed better than ConvMF [44] and its predecessors. In DRMF [37] user and item embeddings are generated by the combined functioning of convolutional and gated recurrent neural networks. Subsequently, the learned document representation is fed to collaborative filtering to predict ratings. Thus, by considering the side information of both entities (users and items), recommendation accuracy and rating prediction can be improved. ConvMF [44], CERMF [45], and DRMF [37] utilize explicit rating data and implicit details extracted from auxiliary information of users and items. The capsule matrix factorization [46] aptly improved the RS performance by using bidirectional GRU along with a capsule network. In this, capsMF [46] integrates the embedding layer of CNN with bi-directional GRU. This learns the semantics of the user’s and item’s textual documents. Moreover, if either item or user content is not enough, incorporating content information of both the user and item can be beneficial and significantly improve the performance. This effect can be anticipated by comparing CERMF [45] with ConvMF [44] and DRMF [37] with DRMF-user [37] and DRMF-item [37]. The proposed model combines both the semantics and contextual details of user’s in a single numerical matrix. A similar matrix is concurrently achieved for items as well. Moreover, the feature locality information is captured by dividing each feature vector into sections and then max-pool operation is applied to each section in the max-over pooling layer [5]. The user’s and item’s embeddings, having semantics and contextual details, in the form of the numerical matrices, are then used to regularize the user and item latent factors in collaborative filtering. It can be observed from Table 10 that the proposed model performs better as compared to the state-of-the-art models (to the best of our knowledge). Figs. 5, 6, and 7 indicate proposed model convergence in terms of decreasing RMSE and MAE. Thus, by incorporating semantic and contextual ingredients of both the user and item improve the RS performance.

Table 10. Ratings prediction comparison.

DatasetAIVAAYelp
ModelsRMSEMAERMSEMAERMSEMAE
PMF1.20800.94931.40871.18051.21940.9697
ConvMF1.00570.75081.24610.97181.00380.7863
DRMFItems0.98680.72591.21870.92750.99880.7755
DRMFUser0.97220.71841.18630.90881.00220.7864
CERMF0.96210.73531.20910.91010.98430.7697
DRMF0.94260.69820.17890.90000.98650.7615
CapsMF0.95930.70641.15700.8878
CMF − HRS0.70100.68500.86310.86320.95320.7257
  1. Download: Download high-res image (210KB)
  2. Download: Download full-size image

Fig. 5. Performance of CMF-HRS on AIV Dataset.

  1. Download: Download high-res image (190KB)
  2. Download: Download full-size image

Fig. 6. Performance of CMF-HRS on AA Dataset.

  1. Download: Download high-res image (193KB)
  2. Download: Download full-size image

Fig. 7. Performance of CMF-HRS on Yelp Dataset.

4.7.2. Model performance on top-n recommendations

The proposed model is evaluated for the task of top-n item recommendation with the help of precision and recall. In order to achieve the item recommendation task, the rating matrix is converted to binary form and compared with PMF [14], ConvMF [44], CERMF [45], DRMF [37], and capsMF [46]. The rating matrix is converted to the form of 0’s and 1’s, where Rmn=1 if the user has a rated item and Rmn=0 otherwise. For comparison with other models, the conversion of the rating matrix is different from that described in Section 4.6 where conversion is done on the basis of a cut-off rating threshold Th. After the conversion, regularization terms δm,δn, D(dimension of user and item latent factors), and d(embeddings dimension) were chosen. As the dataset is sparse, therefore, lesser relevant items are available for the user. Thus it is difficult to observe the difference with respect to the proposed model’s precision with other stated models . In these conditions, it is more appropriate to observe the recall of the proposed model with other state-of-the-art model. This is indicated by the small differences in precision and recall values in Table 11. Here n is set to 300 for simplicity.
It should be noted that the proposed model provides improved rating prediction and top-n items recommendation accuracy. The improved RMSE and MAE of the proposed model depict better rating prediction whereas a gradual rise in precision and recall illustrate the optimum performance of the proposed model in top-n item recommendations. The proposed model performance is a result of incorporating semantics and contextual details of the textual document into the collaborative filtering technique(PMF). Moreover, utilizing the contents of both user and item also improved the RS efficiency.

Table 11. Top-n item recommendation performance comparison.

DatasetAIVAAYelp
ModelsPrecisionRecallPrecisionRecallPrecisionRecall
PMF0.005060.7720.005830.5490.010670.421
ConvMF0.005050.7710.005830.5460.010630.423
DRMFItems0.005100.7800.005910.5510.010840.429
DRMFUser0.005220.8100.006110.5890.010640.432
CERMF0.005180.8010.005880.5500.010250.410
DRMF0.005190.8020.006040.5830.010840.430
CapsMF0.005150.8100.004510.520
CMF − HRS0.005270.8190.006150.5860.010870.433

5. Conclusion and future work

Recommendation systems are used by organizations for recommending requisite item to the potential user in accordance with his/her desired preferences, however, it suffers from data sparsity problem. The RS performs either of the two tasks i.e. ratings prediction and item recommendation. In this paper, a hybrid collaborative filtering model is proposed in which a convolutional neural network along with w2v is integrated with matrix factorization. The CMF-HRS incorporates word2vec (semantics) and CNN (context) into PMF to capture the content information of both item and user documents. Both user and item contents have been exploited by w2v and CNN to build their optimum dense vector representation for rating predictions and item recommendation. The model learns latent factors of both entities from client reviews along with item textual details. The CNN along with w2v is responsible for user/ item contents feature extractions where feature locality information is also captured. The PMF is mandated with missing ratings prediction for the user and top-n item recommendation. The aforementioned technique can effectively handle both semantics and contexts of user and item documents to prepare corresponding user and item latent factors. The suggested model can be applied to any dataset that is enriched with both implicit and explicit feedback of users and items. Series of experiments were performed to evaluate the proposed model in terms of ratings prediction and items recommendation on three benchmark datasets named Amazon (Amazon Instant Videos and Apps for Android) and Yelp. Experimental findings led to better performance of the model compared to other proposed models. In future, the temporal effect of time to generate latent factors can be explored. Moreover, the noise issue in the recommender system can also be researched, as it will surely affect the performance of the recommender system. Additionally, embedding generation through a bidirectional RNN can also be explored for its subsequent integration into collaborative filtering techniques.

CRediT authorship contribution statement

Zafran Khan: Conceptualization, Methodology, Experimentation, Result compilation, Writing - original draft. Muhammad Ishfaq Hussain: Conceptualization, Methodology, Experimentation, Result compilation, Writing - original draft. Naima Iltaf: Investigation, Writing - review & editing. Joonmo Kim: Investigation, Writing - review & editing. Moongu Jeon: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP), South Korea grant funded by the Korea government (MSIT) (No.2014-3-00077, AI National Strategy Project & No.2019-0-01842, Artificial Intelligence Graduate School Program (GIST)) and Ministry of Culture, Sports and Tourism and Korea Creative Content Agency(Project Number: R2020070004).

References

Cited by (25)

  • CNNRec: Convolutional Neural Network based recommender systems - A survey

    2024, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    The CNN’s design simplicity makes the model good in learning the context of information provided. Khan et al. (2021a) proposed a contextual hybrid model to handle data sparsity in the rating matrix. The embedding developed by Word2Vec (Mikolov et al., 2013) and a CNN of user and items are integrated into a corresponding latent factor of PMF.

  • DCARS: Deep context-aware recommendation system based on session latent context

    2023, Applied Soft Computing
    Citation Excerpt :

    The importance of the user’s context to increase the quality of suggestions has been studied by researchers [3]. Adding user contexts in the recommendations systems (RSs) extends the applications of CARSs [4]. In comparison to other RSs that apply user past interests, CARSs create appropriate suggestions that are close to the current context of the user.

View all citing articles on Scopus
View Abstract