Visual Indeterminacy in Generative Neural Art

Visual Indeterminacy in Generative Neural Art

Aaron Hertzmann
Adobe Research
601 Townsend St.
San Francisco, CA 94103
hertzman@dgp.toronto.edu

1 Introduction

Generative Adversarial Networks (GANs) have become fertile tools for artistic exploration. Artists such as Refik Anadol, Robbie Barrat, Sofia Crespo, Mario Klingemann, Jason Salavon, Helena Sarin, and Mike Tyka generate fascinating imagery with models learned from natural imagery. These artists work in very different ways, but much of their work shares a common GAN aesthetic: realistic, but unrecognizable.

Why are GANs such powerful tools for making art? This essay argues that GAN art often exhibits visual indeterminacy, a term coined by Pepperrell Pepperell (2006). GANs cause visual indeterminacy by creating plausible compositions and textures that nonetheless defy coherent explanation, and these are the GAN images often used in recent artworks. Because visual indeterminacy can be understood as a perceptual process Muth and Carbon (2016), GANs provide a potential tool for both art and for neuroscience experiments based on perceptual uncertainty modeling.

2 Visual Indeterminacy and Aesthetic Experience

Often, the initial appearance of an image invites the viewer to investigate further, but the image confounds explanation. For some images, this investigation leads to an “Aha!” moment, where the viewer understands the structure of an image Muth et al. (2016), e.g., they see a vivid 3D object where there had been abstract 2D shapes. This moment is pleasurable because the posterior distribution collapses: some understanding has been gained. But the image may also become less interesting as result. “Visual indeterminacy” describes images where the “Aha!” moment never happens, and the image continues to invite investigation.

In short, visual indeterminacy occurs in a “seemingly meaningful visual stimulus that denies easy or immediate identification” Pepperell (2011); it is the “lack but promise of semantic stability” Muth and Carbon (2016). Ambiguity has been present in art since cave painting, but became particularly valued in the Modern art era Gamboni (2004). Many recent studies connect these ideas to perceptual theory and neuroscience; see Muth and Carbon (2016); Van de Cruys and Wagemans (2011) for reviews, and Hertzmann (2010) for a discussion of visual ambiguity in terms of perceptual posterior probability.

3 Generating Visual Indeterminacy

Some GAN images are naturalistic; some look like unusual but realistic scenes, e.g., portions of Refik Anadol’s “Machine Hallucinations,” Mike Tyka “EONS”, and Figure 1(left). Some images of humans or animals seem real and grotesque, like portions of Mario Klingemann’s “Memories of Passersby I.” But a dominant mode of GAN art is visual indeterminacy. It looks like there is a real scene being depicted in photorealistic detail, but what is it?

Figure 1: Images from BigGAN Brock et al. (2019) created with Artbreeder Simon (2019). The leftmost image is, basically, naturalistic. The other images are visually indeterminate: they appear realistic at first glance, and suggest various associations, but they do not yield coherent realistic interpretations on closer study. Image credits in the Appendix.

GANs seem predisposed to indeterminate, intriguing imagery. This can be seen by experimenting with Artbreeder Simon (2019) (Figure 1), formerly Ganbreeder. Images drawn from Ganbreeder have been exhibited as art Bailey (2019).

Why do GANs create indeterminate images so often? Recent results by Bau et al. Bau et al. (2019) suggest that early GAN layers correspond to large-scale objects, and later layers capture fine-scale details and textures. This suggests that they construct scenes in pictorial space, first arranging “objects” into compositions, and then placing appropriate textures and details for those objects. Sometimes GANs compose textures in unexpected arrangements, composing realistic parts from familiar types of objects, in unfamiliar combinations. This produces visual indeterminacy. The images are most intriguing when they place familiar-looking elements into evocative but indeterminate configurations that, seemingly, no one would have created without these tools.

As a neural explanation for visual indeterminacy, Muth and Carbon Muth and Carbon (2016) suggest that multiple local neural predictions fail to converge to a coherent interpretation. They quote Gombrich, who wrote about Cubism, “each hypothesis we assume will be knocked out by a contradiction elsewhere” Gombrich (1960).

Intriguingly, some of the paintings that Pepperell created years ago in search of visual indeterminacy Pepperell (2011) appear quite similar to GAN images. As a very unscientific experiment, I showed two of these paintings (Figure 2) to some colleagues, and asked “Can you guess how these images were made?” Four of the five responses hypothesized some kind of neural networks.

Eventually, generative networks may get so good that they rarely, if ever, produce unrealistic images. Perhaps there is an “Uncanny Ridge,” along which generators are only just good enough to produce a diverse set of intriguingly indeterminate images, and past which the outputs are less and less surreal. Once generators pass this Uncanny Ridge, artists will need to find new ways to “break” the models, to coerce them into making new “errors.”

4 Unifying Perception and Aesthetics

Natural image models, vision neuroscience, and image synthesis have long been tightly-coupled fields. Discoveries about the visual cortex Hubel and Wiesel (1968) led to natural image statistics analysis Olshausen and Field (1996); Simoncelli and Olshausen (2001), which led to texture synthesis algorithms Efros and Leung (1999); Portilla and Simoncelli (2000) which led to style transfer algorithms Efros and Freeman (2001); Hertzmann et al. (2001). At the same time, cortical modeling also led to deep convolution networks Fukushima (1980), which led to GANs and trained discriminative networks, which, in turn, have led to improved neuroscience models Yamins and DiCarlo (2016). Can aspects of aesthetic experience be understood with the same models?

Ideally, a generator would accurately represent a distribution over natural images; a recognition model would be the inverse, a providing a posterior distribution over interpretations that would approximate human perceptual uncertainty. Optimization against this model would give artists more precise control over the type of perceptual uncertainty present in images, for example, to produce images with specific types of visual indeterminacy.

This would provide artists with higher-level controls to explore artistic creation. It could also provide a richer testbed to develop perceptual theories of aesthetic experience, rather than using hand-crafted artworks as in Muth et al. (2016); Pepperell (2011).

A more fine-grained neural model of indeterminacy is needed, to account for the temporal evolution from first impressions Oliva and Torralba (2006), to the movement of attention, to the possibility of the “Aha!” moment. Moreover, the categorization of perceptual ambiguity in art is very preliminary and much work remains to be done to expand upon and refine it.

References

  • [1] J. Bailey (2019-03) Why is AI art copyright so complicated?. Note: https://www.artnome.com/news/2019/3/27/why-is-ai-art-copyright-so-complicated Cited by: §3.
  • [2] D. Bau, J. Zhu, H. Strobelt, B. Zhou, J. B. Tenenbaum, W. T. Freeman, and A. Torralba (2019) GAN dissection: visualizing and understanding generative adversarial networks. In Proc. ICLR, Cited by: §3.
  • [3] A. Brock, J. Donahue, and K. Simonyan (2019) Large scale gan training for high fidelity natural image synthesis. In Proc. ICLR, Cited by: Figure 1.
  • [4] A. A. Efros and W. T. Freeman (2001) Image Quilting for Texture Synthesis and Transfer. In Proc. SIGGRAPH, Cited by: §4.
  • [5] A. A. Efros and T. Leung (1999) Texture synthesis by non-parametric sampling. In Proc. International Conference on Computer Vision, Cited by: §4.
  • [6] K. Fukushima (1980-04-01) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36 (4). Cited by: §4.
  • [7] D. Gamboni (2004) Potential images: ambiguity and indeterminacy in modern art. Reaktion Books. Cited by: §2.
  • [8] E. H. Gombrich (1960/2002) Art and illusion: a study in the psychology of pictorial representation. 5th edition, Phaidon Press. Cited by: §3.
  • [9] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin (2001) Image Analogies. In Proc. SIGGRAPH, Cited by: §4.
  • [10] A. Hertzmann (2010) Non-Photorealistic Rendering and the Science of Art. In Proc. NPAR, Cited by: §2.
  • [11] D. H. Hubel and T. N. Wiesel (1968) Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology 195 (1). Cited by: §4.
  • [12] C. Muth and C. Carbon (2016) SeIns: semantic instability in art. Art & Perception 4 (1–2), pp. 145–184. Cited by: §1, §2, §3.
  • [13] C. Muth, M. H. Raab, and C. Carbon (2016) Semantic stability is more pleasurable in unstable episodic contexts. on the relevance of perceptual challenge in art appreciation. Frontiers in Human Neuroscience 10. External Links: Link, Document, ISSN 1662-5161 Cited by: §2, §4.
  • [14] A. Oliva and A. Torralba (2006) Building the gist of a scene: the role of global image features in recognition. In Visual Perception: Fundamentals of Awareness: Multi-Sensory Integration and High-Order Perception, Cited by: §4.
  • [15] B. A. Olshausen and D. J. Field (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381. Cited by: §4.
  • [16] R. Pepperell (2006) Seeing without objects: visual indeterminacy and art. Leonardo 39. Cited by: §1.
  • [17] R. Pepperell (2011) Connecting art and the brain: an artist’s perspective on visual indeterminacy. Frontiers in Human Neuroscience 5. Cited by: Figure 2, §2, §3, §4.
  • [18] J. Portilla and E. P. Simoncelli (2000-10-01) A parametric texture model based on joint statistics of complex wavelet coefficients. International Journal of Computer Vision 40 (1). Cited by: §4.
  • [19] J. Simon (2019) Artbreeder (website). Note: artbreeder.com Cited by: Figure 1, §3.
  • [20] E. P. Simoncelli and B. Olshausen (2001) Natural image statistic and neural representation. Annu. Rev. Neurosci. 24. Cited by: §4.
  • [21] S. Van de Cruys and J. Wagemans (2011) Putting reward in art: a tentative prediction error account of visual art. i-Perception 2 (9). Cited by: §2.
  • [22] D. L. K. Yamins and J. J. DiCarlo (2016) Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience 19. Cited by: §4.

Appendix A Image Credits

Figure 1 images are from Artbreeder, CC-BY-A. Sources and authors (left to right in Figure):

Appendix B Supplemental Figure

Succulus (2005) The Flight (2007)
Figure 2: Paintings by Robert Pepperell, created with the goal of visual indeterminacy, taken from [17].
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
393725
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description