XXX.3 May + June 2023
Page: 34
Digital Citation

Seeing Like a Dataset: Notes on AI Photography

Eryk Salvaggio

back to top 

The camera began to train photographers in 1816. Photography developed a set of rules, and a photographer may follow those rules when scouring the landscape for images, or else work with the camera to produce new ways of recording the world. Through repetition, the practices become instincts or habits. The camera, as a tool to capture what we see, changes how we see. As the philosopher Vilém Flusser writes, "Photographers have power over those who look at their photographs, they program their actions; and the camera has power over the photographers, it programs their acts" [1].

back to top  Insights

Photographing the world for artificial intelligence transforms the eye of the photographer.
In building image datasets for GANs, photographers become biased toward patterns.

As an AI artist working on training generative adversarial networks (GAN) on my own datasets, I often take photographs of natural patterns found on walks along beaches or forest trails. These are for building a dataset of that outing, training a model, and then generating an extended, simulated wandering. I take a few hundred of these photographs at a time, the minimum required to train a GAN, or rather to extend the training of StyleGAN2 to produce images based on my own photographs. GAN photography is the practice of going into the world with a camera, collecting 500 to 5,000 images for a dataset, cropping those images, creating variations (reversing, rotating, etc.), and training for a few thousand epochs to create even more extensions of those 500 to 5,000 images. In turn, these GANs will make a study of the pixel arrangements of those natural patterns, assign them coordinates and weights, and then reconstruct these clusters and patterns into new, unseen compositions.

The aesthetic value of this output as photography is questionable. The output of these GANs is to produce endless variations of the dataset—in other words, to produce even more of these 5,000 images. The practice is instead a form of artistic research, a way of bringing the affordances of GANs into embodiment through my motions: scanning the landscapes, moving my body toward whatever dataset I am trying to construct. It is the act of constructing this dataset through a blend of artistic eye and mechanical detachment that forms the true body of the work.

As a result of this practice, my vision as a photographer has shifted. The rules of photographic composition are pointless to an AI eye. Just as the camera and norms of photography shaped how and what I saw, the AI—and what the AI needs—shapes it too. One learns to think like a dataset. I do not compose one image; I compose 500—ideally 5,000. With too much variation in the data, the patterns won't make sense; the results will be blurred and abstracted. Too little variation, and I overfit the model: lots of copies of the same thing.

An image taken by the author of the forest floor in Western New York.

So, I must seek continuities of patterns between each shot, with slight variations in composition. It is important to balance similar proportions of the elements within the frame. I aim to balance the splashes of apple-red maple leaves, patches of grass, and bursts of purple wildflowers, without introducing particulars, such as a discarded beer can or a mushroom.

This is an inversion of my photographic instincts, as well as my mushroom forager's instinct. Mushroom foraging has changed the way I see the forest. Once a birdwatcher, my interest in discovering fungi has shifted my attention toward the soil instead of the skyline. A photographer, and a mushroom hunter, will typically look for breaks in patterns. If I stumble across a mushroom, the instinct might be to capture it, on film or in a wicker basket. By contrast, the AI photographer looks away. The mushroom disrupts the patterns of the soil; it is an outlier in need of removal. The AI photographer wants the mud, grass, and leaves. We want clusters and patterns. We don't focus on one image; we focus on the patterns across a sequence of images. We want to give the system something predictable. We seek to bias the dataset toward desirable outcomes: the production of images.

back to top  Extending the Forest

Prediction, whether intended to generate photographs or inform a policy decision, is a matter of time and scale. With enough photographs of mushrooms one can start generating images of mushrooms. Likewise, a biased data scientist could select data points to support any conclusions. The underlying principle is the same: The AI photographer looks away from the unique subjects of the world, declares them off-limits, and looks instead to the patterns surrounding eruptions of variation.

This bias is revealed in the images that artists make with GANs. On the one hand, we might view it as flattening the world. On the other hand, it heightens my awareness of the subtleties of the dull. The singular is beautiful: birds, mushrooms, the person I love. Yet the world behind them, the world we lose to our cognitive frames, is compelling in its own way.

Much of this background world is lost to us through schematic processing [2]. We acknowledge that the soil is muddy and covered in leaves, and so we do not need to individually recognize every fallen leaf. Arrive at something novel in your environment, however, and you pause: What bird is that? Is that a mushroom rising from that log? What kind?

Left: An image taken by the author of the forest floor in Western New York. Right: An image taken by the author of the forest floor in Western New York (top), extended by OpenAI's diffusion-based image generation model, DALL-E 2 (bottom). Left image: An image taken by the author of the forest floor in Western New York. Right sequence: An image taken by the author of the forest floor in Western New York (top), extended by OpenAI's diffusion-based image generation model, DALL-E 2 (bottom).

The benefit of schemata is also the problem with schemata. The world gets lost, until we consciously reactivate our attention. AI pioneer Marvin Minsky used schemata to organize computational processes. Whatever the machine sensed could be placed into the category of ignorable or interruptive:

When one encounters a new situation (or makes a substantial change to one's view of a problem), one selects from memory a structure called a frame. This is a remembered framework to be adapted to fit reality by changing details as necessary. A frame is a data-structure for representing a stereotyped situation like being in a certain kind of living room or going to a child's birthday party. Attached to each frame are several kinds of information. Some of this information is about how to use the frame. Some is about what one can expect to happen next. Some is about what to do if these expectations are not confirmed [3].

Minsky took a metaphor, meant to describe how our brains work, and codified it into a computational system. It is a model of a model of a brain. As an indirect result of brain metaphors being applied as instruction manuals for building complex neural networks, GANs behave in ways that align and reflect these human schemata. Schemata are not always accurate, and information that works against existing schemata is often distorted to fit. We may not "see" a mushroom when we expect to see only leaves and weeds.

An AI photographer is, therefore, a human mind looking for patterns for machines through the concept of human schemata adapted to that machine. As an AI photographer, I am looking for the visual noise I typically don't see, the patterns lost to my inattention. While the traditional photographer looks for the interruptive exceptional, the AI photographer looks for the ignorable. If the AI photographer behaves as a traditional photographer, GANs will create distorted images in the presence of any interruptive information: If one mushroom exists within 500 photographs, its traces may appear in generated images, but they will be incomplete, warped to reconcile with whatever data is more abundant.

A careful photographer can learn to play with these biases—call it the art of picking cherries. The dataset can be skewed, and mushrooms may weave their way in. We may start calculating how many mushrooms we need to ensure they are legible to the algorithms but remain ambiguous. A practiced AI photographer can steer the biases of these models in idiosyncratic ways.

back to top  Discussion

GAN photography is about series, permutation, and redundancy. It is designed to create predictable outputs from predictable inputs. It exists only because digital technology has made digital images an abundant resource. It is relatively simple today to take 5,000 images, and this abundance is a precondition for creating 50,000 more. As a result, the "value" of AI photography is low.

The images in this article were not generated by GANs at all. Rather, images I collected for a GAN were uploaded to DALL-E 2 (a diffusion-based model) and extended through outcropping. The center of the image is real, the edges are not. I can type "image of the forest floor" to produce as many as I would want to see. These models are removed from my own direct experience in a way distinct from my GAN dataset. The outcomes of diffusion models are mediated from unknown sources, photographers, and locations. It is possible that GANs will become obsolete, overtaken by these ready-made, pretrained models such as DALL-E 2 that generate images from broad, disparate datasets. As an artist, GAN photography is personal. The process is much more captivating for the artist than the eventual result is for any audience.

Left: An image taken by the author of the forest floor in Western New York. Right: An image taken by the author of the forest floor in Western New York (top), extended by OpenAI's diffusion-based image generation model, DALL-E 2 (bottom). Left image: An image taken by the author of the forest floor in Western New York. Right sequence: An image taken by the author of the forest floor in Western New York (top), extended by OpenAI's diffusion-based image generation model, DALL-E 2 (bottom).

GAN photography is a strangely contemplative and reflective practice. Like Zen meditation techniques, where the student is instructed to constantly redirect attention to the air flowing in through their nostrils, the GAN photographer is constantly returning attention to the details we are inclined to drift away from. The particular and exceptional is often beautiful, but it's not the only form of beauty. In search of one mushroom, we might neglect a hundred thousand maple leaves.

On the other hand, beneath the images produced by GANs and diffusion models alike is the convergence of information and calculation, reduction and exclusion, that flattens the world into a generative abstraction. It is one thing to produce images that acknowledge the ignorable, but another thing to live in a world where these patterns are enforced at the expense of any exception. Beyond (and perhaps even within) the photographic frame, this world is unimaginative. It is the result of a process crafted to reduce difference, to ignore exceptions, dismiss outliers, and mistrust novelty. It is bleak, if not dangerous: a world of uniformity, a world without diversity, a world of only observable and repeatable patterns.

The GAN photographer develops a curious vision, steered by these technologies: a way of seeing aligned with the information flows that curate our lives. The GAN photographer learns to see like a dataset, to internalize its rules. Through practice, the rules become instincts or habits. The data, meant to capture what we see, changes how we see. As an artist, the results of this sustained practice are counterintuitive and dull. The GAN produces endless variations of the same thing. By excluding outliers and selecting data for the model, the model excludes vast swathes of the world of growth and emergences, lost to their unfitness for statistical presence.

There is a lesson here for data practices of all kinds. The practice of GAN photography encourages artists and data practitioners alike to shift focus between the abstraction of schemata and the observation of concrete details. We can scan the leaves for patterns and for the rare bright-red cap of an Amanita muscaria, against all odds. But once inside the model, the world is fit for the comforting reassurance of prediction from previous patterns, rather than the constant emergence and change that truly describes the world at the foot of the forest.

back to top  References

1. Flusser, V. Towards a Philosophy of Photography. 1984 Reaktion Books Ltd, London, 30.

2. von Hippel, W. et al. Inhibitory effect of schematic processing on perceptual encoding. Journal of Personality and Social Psychology 64, 6 (1993): 921–35.

3. Minsky, M. Minsky's frame system theory. Proc. of the 1975 Workshop on Theoretical Issues in Natural Language Processing. 1975, 104–116; https://doi.org/10.3115/980190.980222

back to top  Author

Eryk Salvaggio is an independent researcher, using art as a tool for researching artificial intelligence, nonhuman intelligence, and design. [email protected]

back to top  Footnotes


back to top 

Copyright held by author. Publication rights licensed to ACM.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.

Post Comment

No Comments Found