Demographic Influences on Contemporary Art with Unsupervised Style Embeddings
Computational art analysis has, through its reliance on classification tasks, prioritised historical datasets in which the artworks are already well sorted with the necessary annotations. Art produced today, on the other hand, is numerous and easily accessible, through the internet and social networks that are used by professional and amateur artists alike to display their work. Although this art—yet unsorted in terms of style and genre—is less suited for supervised analysis, the data sources come with novel information that may help frame the visual content in equally novel ways. As a first step in this direction, we present contempArt, a multi-modal dataset of exclusively contemporary artworks. contempArt is a collection of paintings and drawings, a detailed graph network based on social connections on Instagram and additional socio-demographic information; all attached to 442 artists at the beginning of their career. We evaluate three methods suited for generating unsupervised style embeddings of images and correlate them with the remaining data. We find no connections between visual style on the one hand and social proximity, gender, and nationality on the other.
Keywords:unsupervised analysis, contemporary art, social networks
The methodological melting pot that is the interdisciplinary field of digital art history has, in recent years, been shaped more by exhausting technical novelty than theoretical guidance [5, 2]. Alongside proponents of the technological transformation, who see its vast databases as an opportunity to investigate large-scale patterns of various nature [13, 23], there has been criticism that its pure empiricism hinders any true discovery . Computer vision (CV), which has also found its way into the field by providing state-of-the-art methods  and assembling large multi-modal datasets [35, 4, 17], has not been exempt from this criticism. Specifically, that only extracting and connecting high-level semantics of paintings ignores the real-world context in which art is being produced and belongs to an outdated form of comparative art history [38, 1].
Furthermore, recent progress, both visually [54, 25] and numerically , has not changed the fact that CV’s potential to effectively engage the digital humanities is bounded by one recurrent factor: labels. Labels are an obvious necessity for aligning input data with a relevant supervisory signal, in general learning tasks, and a less obvious one in creating image tuples for texture synthesis or generative models. As classification tasks have become omni-present throughout the field, so have labels. At first glance, giving centre stage to typology seems to be in line with it being one of art historians main research interests .
However, in supervised learning, the annotations serve as researchâs means and not its end, rendering the possibility of expanding upon that same annotation impossible. This becomes problematic due to the absence of perfect knowledge in art history, as opposed to more common classification tasks such as object recognition, where the classes are flawless and the image labels non-negotiable [33, 51]. Contrary to images of objects, paintings and their historical contextualisation is very much an open-ended and contested subject . By ignoring the uncertainty attached to the labels of art datasets, CV on the one hand handicaps its potential in investigating art in a novel way and, on the other hand perpetuates a misleadingly homogeneous image of the art-historical canon.
Overcoming these limitations and advancing into interdisciplinary depths requires CV to turn away from existing labels and instead embrace two other central research interests in classical art history: a) the visual representation of art and b) the social context of its production . In this work, we present two contributions in line with these two themes:
For extracting visual representations in an unsupervised manner, we evaluate and utilise existing techniques from different domains.
For studying the social world surrounding art, we introduce contempArt, the first dataset on contemporary painters with images, socio-demographic information and social media data representing social relationships.
Aligning the information on demographics and social relationships with the attained style-embeddings allows us to investigate tangible connections beyond the visual realm. However, we find no evidence that social closeness entails similarity in style or that demographic factors correlate with visual content.
2 Related Work
2.1 Unsupervised Art Analysis
Analysis of embeddings.
Compared to the substantial amount of work on art classification [42, 37, 35, 47, 4], only rarely have image representations themselves been at the centre of computational analysis. One of the earliest such works is the seminal paper by , in which the fractal dimension of Jackson Pollockâs drip paintings is measured and compared over the course of his career. In a similar vein, aggregate image-level statistics based on colour and brightness are used in [44, 28, 31] to visualise the evolution of artworks. In , object recognition features and time annotations are combined to assign creativity scores representing the visual novelty of artworks, at the time of their creation. Due to the success of convolutional neural networks (CNN) in image classification, art analysis has seen handcrafted image representations being replaced by deep feature vectors extracted from these CNNs. Of note is , in which deep features from a CNN trained for object recognition outperform older handcrafted features in classifying art styles.  also uses features provided by multiple CNNs to investigate connections between artwork and human perception. , on the other hand, analyses different variance statistics between the layers of an object-recognition CNN and finds that these values can discern art from non-art.
Analysis of Clusters.
Other work is focused on applying complex unsupervised learning techniques on both handcrafted and deep image features [46, 43, 21, 7]. Most notable amongst these clustering studies is , in which artistic style is attained by computing statistics at different layers of a pre-trained object recognition CNN, a methodology created for texture synthesis , and these features are additionally clustered with archetypal analysis .
2.2 Social Context
Expanding art-based deep learning techniques to include information beyond the visual has been premiered in , where multi-modal relationships between various artistic metadata are used to increase the performance of image classification and image retrieval. To allow these broader analysis, the Semart dataset  was introduced, where images are paired with artistic comments describing them.
An older dataset on deviantArt,
Among the more commonly used art datasets such as Wikiart
3 contempArt Dataset
Due to the manual and time-consuming nature of the data collecting process described in the following text, only art students in Germany were included in the analysis. To create the contempArt dataset, we first gather information on student enrolment in all fine arts programs related to painting or drawing at German art schools. This information is not publicly available until students join a specific painting or drawing class associated with one professor. These painting classes often have an online presence on which the names of current and former students are provided.
|1||Alanus University of Arts and Social Sciences||25||677|
|2||WeiÃensee Academy of Art Berlin||8||144|
|3||Berlin University of the Arts||24||601|
|4||Braunschweig University of Art||39||1,122|
|5||University of the Arts Bremen||29||991|
|6||Dresden Academy of Fine Arts||44||1,743|
|7||Burg Giebichenstein University of Art and Design||18||777|
|8||Hochschule fÃ¼r Grafikund Buchkunst Leipzig||68||2,623|
|9||Mainz Academy of Arts||19||427|
|10||Academy of Fine Arts MÃ¼nchen||44||1,227|
|12||Academy of Fine Arts NÃ¼rnberg||37||555|
|13||Hochschule fÃ¼r Gestaltung Offenbach am Main||11||191|
|14||Hochschule der Bildenden KÃ¼nste Saar||25||553|
|15||State Academy of Fine Arts Stuttgart||18||690|
Less than half of the original list of 1,177 enrolled students had any findable online presence. From the final set of 442 artists, 14,559 images could be collected, with the median number per artists being 20. The data sources were both Instagram accounts and webpages, whereas 37.78% of students only had the former, 17.19% only had the latter, and 45.02% had both. Each data source contributed different metadata to the dataset. Dedicated homepages, the source for 62.37% of all images, generally contain self reported information on the artists nationality and gender. The image data from Instagram, on the other hand, was of lower quality
Instagram network graphs.
The sets of Instagram accounts following and being followed by each artist, available for 82.35% of the sample,
4 Unsupervised Style Embeddings
In order to compute image embeddings that are closely related to artistic style, we follow three established, unsupervised approaches that are all based on the VGG network for image classification . Although newer and deeper CNN, such as ResNet , have since been proposed that outperform the VGG-network on its original task, it has become a widely used tool in both art classification tasks  and texture synthesis [18, 25]. After presenting the different methods, we will examine their visual and numerical connection to labels and images of a commonly used fine art dataset.
Raw VGG embeddings.
We use the deepest network variant of VGG with 19 stacked convolutional layers and three fully connected layers on top. The network is pre-trained on the ImageNet database  and the second to last layer is used as the style embedding for any image . Similar deep features , that are derived from CNNs trained on ImageNet and not art in particular, have been shown to contain salient information about the latter.
Texture-based VGG embeddings.
In the seminal work of  it has been shown that deep CNNs, and VGG in particular, can be leveraged to perform arbitrary artistic style transfer between images. Specifically, that the correlations inside convolutional feature maps of certain network layers capture positionless information on the texture or rather, the style of images. This, so-called, Gram-based style representation has been widely used in texture synthesis  and art classification [35, 10]. Contrary to , in which this style representation is a part of an optimisation procedure aligning the texture of two images, we utilise it only as a further embedding of style . The extraction process is as follows:
Consider the activations at feature map of image at layer of the VGG network described in the previous subsection. is the number of channels, and and represent the width and height of feature map . denotes the column vector in that holds the feature map activations at pixel position . Following the proposed normalisation procedure in , the Gram matrix of the centered feature maps at , given by
and the means themselves, given by
are concatenated into a high-dimensional collection with further normalisation by
in line with . The Gram matrix is symmetric, so values below the diagonal are omitted. The collection is vectorised to a -dimensional texture descriptor , which can be computed for any image . However, due to v being very high-dimensional it is common practice to apply a secondary dimensional reduction on the joint texture descriptors of the present image dataset [19, 50, 36]. To do so, we aggregate for all images in the given dataset and concatenate them into a matrix . We apply singular value decomposition to this matrix, extracting 4,096-dimensional features as our second style embedding for image .
Wynen et al.  uses the previously described Gramian texture descriptor and a classical unsupervised learning method called archetypal analysis  to compute and visualise a set of art archetypes.
where contain the mixture coefficients, ’s, that approximate each observations by a combination of archetypes, whereas the archetypes are themselves convex mixtures of samples:
where contain the mixture coefficients, ’s, that approximate each archetype with . For ease of notation, let and be matrices that contain ’s and ’s, respectively. Then, the optimal weights of A and B can be found by minimising the residual sum of squares
subject to the above constraints, with efficient solvers . The number of archetypes can be predefined, as in , or adjusted by visually comparing at different -values as in the original work . We apply archetypal analysis on the matrix of stacked texture descriptors for the given dataset containing images. The estimated archetype-to-image and image-to-archetype mixture weights and of each image are then concatenated into the final style embedding .
5 Comparative Evaluation of Style Embeddings
The unsupervised nature of the described embeddings - subsequently called VGG, Texture and Archetype - requires an evaluation of their connection to artistic style and differences therein. Due to the visual nature of artworks, evaluations of their unsupervised feature spaces often rely only on visual comparisons, as in the case of the archetypal style embeddings  or texture synthesis [53, 25, 30]. In the following, we investigate both the visual differences between the embeddings as well as their relation to existing style labels.
We download a balanced subset of Wikiart, sampling 1,000 random images each from the 20 most frequent style labels after excluding non-painting artworks that were classified as photography, architecture, etc.
For a range of , we partition the three embeddings into clusters with the k-Means algorithm. The informational overlap between the resulting cluster memberships and the existing style annotations is calculated with the adjusted mutual information (AMI) score , a normalised and chance-adjusted metric that quantifies the reduction in class entropy when the cluster labels are known. A value of 1 would represent cluster and class membership matching up perfectly, whereas values around 0 would signify a random clustering. In order to provide a more transparent yet unnormalised evaluation measure, we additionally show the purity score, for which the images in each cluster are assigned to its most frequent style label and the average number of thereby correctly assigned images is calculated. The results in Figure 4 show that the VGG embeddings have the highest AMI and purity score for all values of . The Archetype and Texture embeddings have similar results, even though the dimensionality of the former is 50 times less. Even the highest AMI-score of can still be considered closer to a random clustering than an informative one, leading to the conclusion that none of the embeddings correspond closely to commonly used labels of artistic style. However, style annotations in fine art datasets are known to be very broad and, in Wikiart’s case, noisy , allowing for some margin of error and calling for a further, visual inspection of the embeddings.
We visualise a small set of randomly chosen images with their five closest neighbours for each of the style embeddings. Closeness is calculated with the cosine similarity. The comparison in Figure 5 gives insights into the difference between style annotations and stylistic similarity. The Archetype embedding, not being able to cluster the visually unique Ukiyo-e genre as well as failing to align even general colour space, again performs the worst. Archetypal analysis, while allowing a high degree of interpretability and aesthetic visualisations [50, 9] by encoding images as convex mixtures of themselves, has to be evaluated more rigorously to validate its usefulness for art analysis. VGG and Texture are each able to match up images in terms of a general visual style. However, both are inconsistent in doing the same for labelled style, echoing the results of the numerical evaluation.
The overlap between the evaluated embeddings and regular style annotations was shown to be minimal, but two of the three still contain valid information on artistic visual similarity. Texture, although exceptional in transferring itself, does not capture style in the art historical sense. Conversely, that same style can not be described by visual content alone, validating context-based approaches to art classification tasks as in .
6 Analysis of contempArt
The VGG embeddings of the contempArt images, partially visualised in Figure 6, exhibit a reasonable connection to visual style by separating broad patterns, such as colourful paintings opposite black and white sketches, as well as smaller ones, such as unique styles of single artists. In order to correlate these embeddings with the collected socio-demographic information we must aggregate them to the artist-level. Consider the set of artists, A= where is the number of artists and each artist has a set of image embeddings where is the number of paintings for the -th artist. For all further analysis we compute each artists centroid style embedding
Only few artists have a singular repetitive style, which is especially true for art students for whom experimentation is essential. To be able to judge this variance of style we also compute the average intra-artist style distance to each centroid embedding with cosine distance
To have a comparable measure of variation we further compute the average centroid distance for all images in the dataset
where is the average of all image embeddings. The results are shown in Table 3 for all three style embeddings. The Texture embeddings have the smallest amount of variation, both globally and across artists. For the Archetype embedding, the number of archetypes was set to through visual inspection of the reduction in the residual error as in .
|VGG||.283 .080||.435 .101|
|Texture||.137 .049||.211 .094|
|Archetype||.195 .121||.323 .326|
6.1 Social Networks and Style
We use the node2vec algorithm  on both graphs and to project their relational data into a low-dimensional feature space. node2vec is a random-walk based graph embedding technique that preserves network neighbourhoods and, contrary to most other methods, structural similarity of nodes. This additional capability is especially useful for the larger network , in which the homophily captured by a pure social network such as is augmented by detailed and vast information on taste. We compute 128 node2vec features for each of the graphs and use cosine distance to generate a matrix of artist-level social network distances. Similarly, we generate pairwise style distances with the centroid embeddings for all three embeddings.
Spearmansâs rank coefficient is used to compute the correlation between the flattened upper triangular parts of the described distance matrices. The results in Table 3 show that there are only very small correlations between stylistic and social distance. Even though the two graphs share only a minor similarity (), neither network contains information that relates to inter-artist differences in style. The clear overlap between school affiliation and the smaller network graph , as seen in Figure 3, allows the further conclusion, that art schools too, have no bearing on artistic style.
6.2 Socio-demographic Factors and Style
We investigate possible connections between the style embeddings and the collected data on the artists by jointly visualising them. Specifically, we extract a two-dimensional feature space from the VGG embeddings with t-SNE , both per image and per artist with the previously described aggregation. There were no visible patterns for any of the available variables, including Instagram-specific measures such as likes, comments or the number of followers and general ones such as nationality, gender or art school affiliation. We show two exemplary results in Figure 7, in which the independence of style from these factors is apparent. This is not a surprising result as the non-visual factors are primarily attached to the individual artist and not their work. Even painting-specific reactions on Instagram depend more on the activity and reach of their creators than the artworks themselves.
This work presented the first combined analysis of contemporary fine art and its social context by assembling a unique dataset on German art students and using unsupervised methodologies to extract and correlate artworks with their context. The collected data consisted of images, social network graphs and socio-demographic information on the artists. Three established methods to obtain style embeddings from images of paintings were briefly evaluated, outside of the usual framework of supervision, in their connection to common style annotations and general visual similarity. These embeddings of artistic style were shown to be entirely independent of any non-visual data. Further work will go into increasing dataset size, to reduce the effect of noise induced by the high amount of heterogeneity present in art produced by artists early in their career, and into contrasting the contemporary artworks with historical ones.
Acknowledgement This work was supported by JSPS KAKENHI No. 20K19822.
- Example: http://www.klasse-orosz.de/
- Images available on artists dedicated webpages are generally of high resolution and only depict their work. Contrary to Instagram, which limits the image resolution by default to pixels and where the images uploaded by the artists were often noisy; e.g. taken from a larger distance or of artwork surrounded by objects. Cropping away unnecessary content further reduced the image size.
- Two Instagram accounts were deleted or renamed during the data collection process so only their image data is available.
- Definition of archetype: the original pattern or model of which all things of the same type are representations or copies .
- Included styles: Abstract Art, Abstract Expressionism, Art Informel, Art Nouveau (Modern), Baroque, Cubism, Early Renaissance, Expressionism, High Renaissance, Impressionism, NaÃ¯ve Art (Primitivism), Neoclassicism, Northern Renaissance, Post-Impressionism, Realism, Rococo, Romanticism, Surrealism, Symbolism, Ukiyo-e.
- (2017) Introduction: some stakes of comparison. In Comparativism in Art History, pp. 1–15. Cited by: §1.
- (2018) Can we teach computers to understand art? Domain adaptation for enhancing deep networks capacity to de-abstract art. Image and Vision Computing 77, pp. 21–32. Cited by: §1.
- Cited by: Figure 3.
- (2019) Multitask painting categorization by deep multibranch neural network. Expert Systems with Applications 135, pp. 90–101. Cited by: §1, §1, §2.1.
- (2018) Against digital art history. International Journal for Digital Art History (3). Cited by: §1.
- (2017) Using CNN features to better understand what makes visual artworks special. Frontiers in psychology 8, pp. 830. Cited by: §2.1.
- (2020) Deep convolutional embedding for digitized painting clustering. arXiv preprint arXiv:2003.08597. Cited by: §2.1.
- (2019) A deep learning perspective on beauty, sentiment, and remembrance of art. IEEE Access 7, pp. 73694–73710. Cited by: §2.1.
- (2014) Fast and robust archetypal analysis for representation learning. In Proc. CVPR, pp. 1478–1485. Cited by: §4, §5.
- (2018) Image style classification based on learnt deep correlation features. IEEE Transactions on Multimedia 20 (9), pp. 2491–2502. Cited by: §4.
- (1994) Archetypal analysis. Technometrics 36 (4), pp. 338–347. Cited by: §2.1, §4, §6.
- (2009) ImageNet: a large-scale hierarchical image database. In Proc. CVPR, pp. 248–255. Cited by: §4.
- (2013) Is there a âdigitalâ art history?. Visual Resources 29 (1-2), pp. 5–13. Cited by: §1.
- (2018) The shape of art history in the eyes of the machine. In Proc. AAAI, Cited by: §5.
- (2015) Quantifying creativity in art networks. In Proc. ICCC, pp. 39. Cited by: §2.1.
- (2019) Context-aware embeddings for automatic art analysis. In Proc. ICMR, pp. 25–33. Cited by: §2.2, §5.
- (2018) How to read paintings: semantic art understanding with multi-modal retrieval. In Proc. ECCV workshops, Cited by: §1, §2.2.
- (2016) Image style transfer using convolutional neural networks. In Proc, CVPR, pp. 2414–2423. Cited by: §2.1, §4, §4.
- (2015) Texture synthesis using convolutional neural networks. In Proc. NeurIPS, pp. 262–270. Cited by: §4.
- (2016) Node2vec: scalable feature learning for networks. In Proc. SIGKDD, pp. 855–864. Cited by: §6.1.
- (2018) Predicting and grouping digitized paintings by style using unsupervised feature learning. Journal of Cultural Heritage 31, pp. 13–23. Cited by: §2.1.
- (2016) Deep residual learning for image recognition. In Proc. CVPR, pp. 770–778. Cited by: §4.
- (2019) Digital art history as the social history of art: towards the disciplinary relevance of digital methods. Visual Resources 35 (1-2), pp. 21–33. Cited by: §1, §1.
- (2020) Digital methods and the historiography of art. The Routledge Companion to Digital Humanities and Art History. Cited by: §1, §1.
- (2019) Neural style transfer: a review. Trans. Visualization and Computer Graphics. Cited by: §1, §4, §4, §5.
- (2013) Recognizing image style. In Proc. BMVC, Cited by: §1, §2.1, §4.
- (2014) Painting-91: a large scale database for computational painting categorization. Machine Vision and Applications 25 (6), pp. 1385–1397. Cited by: §2.2.
- (2014) Large-scale quantitative analysis of painting arts. Scientific reports 4, pp. 7370. Cited by: §2.1.
- (2017) Creative community demystified: a statistical overview of behance. arXiv preprint arXiv:1703.00800. Cited by: §2.2.
- (2019) Content and style disentanglement for artistic style transfer. In Proc. ICCV, pp. 4422–4431. Cited by: §5.
- (2018) Heterogeneity in chromatic distance in images and characterization of massive painting data set. PloS one 13 (9). Cited by: §2.1.
- (2017) Diversified texture synthesis with feed-forward networks. In Proc. CVPR, pp. 3920–3928. Cited by: §4.
- (2014) Microsoft COCO: common objects in context. In Proc. ECCV, pp. 740–755. Cited by: §1.
- (2008) Visualizing data using t-SNE. Journal of Machine Learning Research 9 (Nov), pp. 2579–2605. Cited by: Figure 6, Figure 7, §6.2.
- (2017) DeepArt: learning joint representations of visual arts. In Proc. ACMMM, pp. 1183–1191. Cited by: §1, §2.1, §4, §4.
- (2016) CNN-based style vector for style image retrieval. In Proc. ICMR, pp. 309–312. Cited by: §4.
- (2014) The Rijksmuseum challenge: museum-centered visual recognition. In Proc. ICMR, pp. 451–454. Cited by: §2.1.
- (2019) Digital art history and the computational imagination. International Journal for Digital Art History: Issue 3, 2018: Digital Space and Architecture 3, pp. 141. Cited by: §1.
- (2009) Merriam-Webster Online Dictionary. External Links: Cited by: footnote 8.
- (2013) Combining cultural analytics and networks analysis: studying a social network site with user-generated content. Journal of Broadcasting & Electronic Media 57 (3), pp. 409–426. Cited by: §2.2.
- (2012) DeviantArt in spotlight: a network of artists. Leonardo 45 (5), pp. 486–487. Cited by: §2.2.
- (2010) Impressionism, expressionism, surrealism: automated recognition of painters and schools of art. Trans. on Applied Perception 7 (2), pp. 1–17. Cited by: §2.1.
- (2012) Computer analysis of art. Journal on Computing and Cultural Heritage 5 (2), pp. 1–11. Cited by: §2.1.
- (2018) History of art paintings through the lens of entropy and complexity. Proc. National Academy of Sciences 115 (37), pp. E8585–E8594. Cited by: §2.1.
- (2015) Very deep convolutional networks for large-scale image recognition. In Proc. ICLR, Cited by: §4.
- (2009) Image statistics for clustering paintings according to their visual appearance. In Eurographics Workshop on Computational Aesthetics in Graphics, Visualization and Imaging, pp. 57–64. Cited by: §2.1.
- (2018) OmniArt: a large-scale artistic benchmark. TOMM 14 (4), pp. 1–21. Cited by: §2.1.
- (1999) Fractal analysis of pollock’s drip paintings. Nature 399 (6735), pp. 422–422. Cited by: §2.1.
- (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, pp. 2837–2854. Cited by: §5.
- (2018) Unsupervised learning of artistic styles with archetypal style analysis. In Proc. NeurIPS, pp. 6584–6593. Cited by: §2.1, §4, §4, §5, §5.
- (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §1.
- (2017) Quantifying the development of user-generated art during 2001–2010. PloS one 12 (8). Cited by: §2.2.
- (2020) Improving style transfer with calibrated metrics. In Proc. WACV, pp. 3160–3168. Cited by: §5.
- (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. ICCV, pp. 2223–2232. Cited by: §1.