Exploring Content-based Artwork Recommendation with Metadata and Visual Features
Compared to other areas, artwork recommendation has received little attention, despite the continuous growth of the artwork market. Previous research has relied on ratings and metadata to make artwork recommendations, as well as visual features extracted with deep neural networks (DNN). However, these features have no direct interpretation to explicit visual features (e.g. brightness, texture) which might hinder explainability and user-acceptance.
In this work, we study the impact of artwork metadata as well as visual features (DNN-based and attractiveness-based) for physical artwork recommendation, using images and transaction data from the UGallery online artwork store. Our results indicate that: (i) visual features perform better than manually curated data, (ii) DNN-based visual features perform better than attractiveness-based ones, and (iii) a hybrid approach improves the performance further. Our research can inform the development of new artwork recommenders relying on diverse content data.
Compared to markets affected by 2008’s financial crisis, online artwork sales are booming due to social media and new consumption behavior of millennials. Online art sales reached $3.27 billions in 2015, and at the current grow rate, they will reach $9.58 billion by 2020 (Esman, 2012). Notably, although many online businesses utilize recommendation systems to boost their revenue, online artwork recommendation has received little attention compared to other areas such as movies (Amatriain, 2013) or music (Celma, 2010). Previous research has shown the potential of personalized recommendations in the arts domain, such as the CHIP project (Aroyo et al., 2007), that implemented a personalized recommendation system for the Rijksmuseum. More recently, He et al. (He et al., 2016) used pre-trained deep neural networks (DNN) for recommendation of digital art, obtaining good results. Unfortunaly, their method is not applicable for the physical artwork problem as the method assumes that the same item can be bought over and over again. Hence their work only works under the collaborative filtering assumption and also did not investigate explicit visual features nor metadata.
Objective. In this paper, we investigate the impact of different features for recommending physical artworks. In particular, we reveal the utility of artwork metadata, latent (DNN) and explicit visual features extracted from images. We address the problem of artwork recommendation with positive-only feedback (user transactions) over one-of-a-kind items, i.e., only one instance of each artwork (paintings) is available in the dataset.
Research Questions. Our work was driven by the following research questions: RQ1. How do manually-curated metadata perform compared to visual features?, RQ2. How do latent visual features from pre-trained DNNs and explicit visual features perform and compare to each other?, and RQ3. Do feature combinations provide the best recommendation performance?
Contributions. Our work makes a contribution to the unexplored problem of recommending physical artworks. We run simulated experiments with real-world transaction data provided by a popular online artwork store based in USA named UGallery111http://www.UGallery.com/. We also introduce a hybrid artwork recommender which exploits all features at the same time. Our results indicate that visual features perform better than manually-curated metadata. In addition, we show that DNN features work better than explicit attractiveness-based visual features.
2. Problem Description
The online web store UGallery supports young and emergent artists by helping them to sell their artworks over their online plattform. To help users of the plattform to explore the vast amount of artworks more efficiently, they are currently investigating with us the possibility of top-n content-based recommendation methods within the plattform exploiting features such as artwork metadata, implicit and explicit visual features.
UGallery provided us with an anonymized dataset of users, items and purchases (transactions) of paintings, where all users have made at least one transaction. In average, each user has bought 2-3 items in the latest years222Our collaborators at UGallery requested us not to disclose the exact dates when the data was collected..
Metadata. Artworks in the UGallery dataset were manually curated by experts. In total, there are five attributes: color (e.g. red, blue), subject (e.g. sports, travel), style (e.g. abstract, surrealism), medium (e.g. oil, acrylic), and mood (e.g. energetic, warm).
Visual Features. For each image representing a painting in the dataset we obtain features from an AlexNet DNN (Krizhevsky et al., 2012), which outputs a vector of 4,096 dimensions. We also obtain a vector of explicit visual features of attractiveness, based on the work of San Pedro et al. (San Pedro and Siersdorfer, 2009): brightness, saturation, sharpness, entropy, RGB-contrast, colorfulness and naturalness.
4. Experimental Setup & Results
Recommendation Methods. We compare five methods based on the features used: (1) Metadata: features based on the metadata of the items previously bought by the user, (2) DNN: features from images using the AlexNet DNN (Krizhevsky et al., 2012), (3) EVF: Explicit visual features based on attractiveness of the images (San Pedro and Siersdorfer, 2009), (4) Hyb (DNN + EVF): hybrid model using DNN and EVF features, and (5) Hyb (DNN + EVF + Metadata): hybrid model using DNN, EVF and metadata. For the hybrid recommendations, we combine scores of different sources using the BPR framework (Rendle et al., 2009). In Figure 1 we see, for instance, a user profile at the left side, besides the image embedding based on features from AlexNet DNN, and then recommendation obtained by three different methods.
Evaluation. Our protocol is based on the one as introduced by Macedo et al. (Macedo et al., 2015) to evaluate recommender system accuratly in a temporal manner. We attempt to predict the items purchased in every transaction, where the training set contains all the artworks previously bought by a user just before making the transaction to be predicted. Users who have purchased exactly one artwork were remove as their would be no training instance available. Metrics. As suggested by Cremonesi et al. (Cremonesi et al., 2010) for top-n recommendations, we used and , as well as nDCG (Manning et al., 2008).
Results. Table 1 presents the results, which can be summarized as follows: (1) Visual features outperform metadata features. This result is a quite positive finding as manually crafted metadata costs time and money, (2) visual features obtained from the AlexNet DNN perform better than those based on explicit visual features. Although this result shows that DNNs do again a remarkable job in this domain, we are not too happy about it. Features obtained from an DNN such as AlexNet are latent, i.e., we cannot interpret them directly and we can not use them to explain the recommendations made (Verbert et al., 2013). Finally, (3) our experiments reveal that the hybrid method performs even best.
In this work we introduce content-based recommendation for physical artworks, comparing manually-curated metadata, AlexNet DNN features, and attractiveness-based visual features. Furthermore, we show that the DNN features outperform the explicit visual features and metadata. In practice this has two implications: First, there is no need to exploit metadata as visual features work better. Second, it will be difficult to provide explanations to users as explicit features work significanly worse than latend features obtain via DNNs. It would be interesting though to investigate, whether this gap can be closed in a real-world experiment. The current investigations are just based on simulations and neglect the user factor, though give a hint towards the performance of the models when no explanations are given.
- Amatriain (2013) Xavier Amatriain. 2013. Mining large streams of user data for personalized recommendations. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 37–48.
- Aroyo et al. (2007) LM Aroyo, Y Wang, R Brussee, Peter Gorgels, LW Rutledge, and N Stash. 2007. Personalized museum experience: The Rijksmuseum use case. In Proceedings of Museums and the Web.
- Celma (2010) Oscar Celma. 2010. Music recommendation. In Music Recommendation and Discovery. Springer, 43–85.
- Cremonesi et al. (2010) Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems (RecSys ’10). ACM, New York, NY, USA, 39–46.
- Esman (2012) Abigail R. Esman. 2012. The World’s Strongest Economy? The Global Art Market. https://www.forbes.com/sites/abigailesman/2012/02/29/the-worlds-strongest-economy-the-global-art-market/. (2012). [Online; accessed 21-March-2017].
- He et al. (2016) Ruining He, Chen Fang, Zhaowen Wang, and Julian McAuley. 2016. Vista: A Visually, Socially, and Temporally-aware Model for Artistic Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys ’16). ACM, New York, NY, USA, 309–316.
- Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
- Macedo et al. (2015) Augusto Q Macedo, Leandro B Marinho, and Rodrygo LT Santos. 2015. Context-aware event recommendation in event-based social networks. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 123–130.
- Manning et al. (2008) Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et al. 2008. Introduction to information retrieval. Vol. 1. Cambridge university press Cambridge.
- Rendle et al. (2009) Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 452–461.
- San Pedro and Siersdorfer (2009) Jose San Pedro and Stefan Siersdorfer. 2009. Ranking and Classifying Attractiveness of Photos in Folksonomies. In Proceedings of the 18th International Conference on World Wide Web (WWW ’09). ACM, New York, NY, USA, 771–780.
- Verbert et al. (2013) Katrien Verbert, Denis Parra, Peter Brusilovsky, and Erik Duval. 2013. Visualizing recommendations to support exploration, transparency and controllability. In Proceedings of the 2013 international conference on Intelligent user interfaces. ACM, 351–362.