AI photo editing, “truth”, and Visual Anthropology

It has always been recognized that photography and video, while having a special weight as evidence of “what actually happened”, have a creative aspect that can generate a distorted representation of the world. Some would emphasize that distortion is inevitable. The photographer selects the subjects to photograph and out of a stream of photos, selects a specific photo based on their personal agenda. This agenda is partially aesthetic and partially (at least in photo journalism, documentary, or visual anthropology) about the story that the photographer wants to tell about what actually happened.

Some would take from this a sense of futility in trying to “capture or convey objective reality” and from there assert a pointlessness of any critique based on verity or objectiveness. However, if we accept that there is an objective reality, then we can also see that the bias or distortion of the photographer can be greater or lesser and that specific choices of the photographer and presentation of the photograph more or less convey an accurate picture. For examples…

  • selecting a photo where the subject randomly glances downward may falsely represent the subject as sad, depending on associated text
  • selecting photos that capture a specific moment of action, while perhaps only representing a small sliver of time in task, may better represent the purpose or function of the task, the salience of the action.
  • similarly, cropping a photo to focus in on a specific detail of interaction may more accurately convey salient details of relationship.
  • playing with lighting and angle may give a distorted image of the person, for example with “better” or “worse” posture or complexion than is characteristic of the person.
  • Taking a series of photos completely at random, while being in some ways less “biased”, may give a false impression of disorganization of action, particularly when viewers are expecting that the photographer will highlight salient moments of action. (As viewers, we anticipate the psychology of the photographer and will correct for some biases, such that an “objectively” unbiased set of photos may generate bias in reception due to bias correction.)
  • Cropping or angling a photo to remove some people from a photo gives a false impression of their absence, for example a less socially chaotic event or (especially in social media) a pristine and solitary experience of a natural site which is in actually constantly inundated with people.

Our choices as photographers, working in relationship to anticipated expectations and norms of photograph presentation may result in photograph reception closer or farther from objective reality. Also, given how we take in photos in comparison to how we actually take in reality, a randomly taken, representative photo may give a false impression of how we would in person experience that place or event given how our sense filter details differently in person vs in a photo.

AI technologies for photo editing and the norms around them present new possibilities and new choices that we face as photographers. In this particular project in the summer 2023 working with divers in Japan and Tonga, i got to sit with these decisions in relationship to what i feel as a strong duty toward conveying objective reality as a visual anthropologist. I was using the new Topaz AI suite for photography to “improve” some of the images, most of which were taken in highly dynamic circumstances which were suboptimal for photography, to put it mildly. (as an aside, i was quite impressed with what it could do, though it does take some attention to how to apply the AI to avoid strange artifacts)

Something as simple as “sharpening” or “denoising” an image can distort the perception of the moment. With sharpening, the AI model takes the pixels of a low resolution or out of focus image and either corrects pixels based what the image is most likely to be or adds pixels in the case of a low resolution image based on what the model “thinks” should be intermediate pixels. Similarly with denoising, the AI guesses at what is noise and what is image and replaces or modifies noise based on what it “thinks” the image should be. Or at least that is the two sentence version of what it is doing. A radical perspective might be that this is fabrication and no longer simply a presentation of the data gathered. a more nuanced perspective would be to consider what is salient and what is distracting detail and prioritize the perception and construction of experienced image of salient details. From this perspective, an AI sharpened image may more accurately generate the visual experience a person would experience in the moment than the original low resolution or motion blurred image. Then again, it can actually take a moment that was intrinsically blurry (as looking through murky water) and generate an unreal sharp image or, in worse scenarios, may create real-looking artifacts that were not in fact there.

This becomes even more of an issue with AI-based object removal. With this technology, the user (or even the AI model itself) selects an object in the image, and the AI model removes those pixels of the image and then fills it in with its “best guess” of what would otherwise be there. (The technology is improving very quickly, as it becomes more normative to take short micro-videos and take the information from all the frames to create the ideal, AI corrected/mixed/generated image based on them.)

The main social application of this is to remove people from images to give an “undistracted” image, creating the impression of a private experience of some place that is actually bustling with fellow tourists, for example.

I have in my current exhibition at MPI one photo where i used object removal. I think for most people, this will be a non-issue immediately, but as i do believe in primarily trying to accurately give an impression of the moment, i did feel compelled to think about whether it is “ethical” to use object removal in this context.

Below are two version of the image, with and without object removal. In this photo, a young ama (woman diver) in Sugashima, Japan is performing part of a Shinto ceremony, where she is making an offering of the pair of abalone she had just caught. This was as part of the season-opening ritual competition among the ama to see who could the most quickly find a male and female pair of abalone to have the honor of making the offering for safety and good harvest that year.

While taking the photo, the fellow sitting in front of me leaned to the left and I caught his head in the photo … aesthetically not exactly the photo i wanted, but at the same time, the image is what the camera actually caught in that moment.

I decided to explore the object removal function here. I still feel ambivalent about it as a visual anthropologist, so i present here before and after and write down my incomplete thinking on it.

My thinking about why it might be “justified” are two fold. First, it does create a different image which, at least in most moments, i like better. In the moments i like the object removed image better, it feels like the head makes the image a bit messier and it has a better balance without it. My eye is then drawn more to the line of focus of the ama toward the abalone on the table in front of her. Secondly, and perhaps more importantly, the head is not salient to what is going on. Yes, it is accurate to say that my field of view in that moment included a head, but in reality, it was just that moment, and my experience did not saliently feature that head in the way that the photo forces its salience. Also, the fabric where the head should be is a fiction generated by the AI. It’s a very good guess, but it is generated out of the training data of the AI, and is not “really” what was there. The specifics of the fabric in that part of the image, however, are actually not very salient to the socially important details of the image. Yes, it is important to note that it was simple white fabric (which the AI successfully “guessed”), but the details of the folds of the fabric are not important. Considering the mechanics of photo reception, one could argue that by removing the object from the image, the AI and I created a photograph that more accurately evokes the experience of being there in that ceremony.

Then again, maybe i am just making excuses for being aesthetically bothered by the head in the image. And again, in yet another moment, i actually like the head there and the resulting messiness of the image and the active inclusion of fellow audience members to the ceremony.

Certainly one can imagine instances where object removal creates more of a false representation of the visual experience of the moment. The question i meditate over is whether, in the case where one does hold an ethic of verity in photography, is such use of AI object removal always unethical, or does the issue become murkier or more complex as the resulting image may arguably better generate a visual experience as one would experience in the moment? Does the context of the photo matter? It seems that the balance between values of preserved machine acquired data vs reception consideration may be different depending on if the context is journalism, visual anthropology, legal cases, or documentary. or maybe not.

I also have to consider that I was actively trying to get images unobstructed by heads while heads were actively bobbing around in front of me. In a way, the AI gives just another opportunity to bias the image as my own image curation does.

Below are the 3 images (original, cropped, object removal) to meditate over with regards this perhaps niche ethical issue.

PS. As a note, i tried a less aggressive crop with more head remaining (so i could keep more context in the photo) and then tried to remove the head, but the AI was lousy at guessing more complicated missing pixels that included an object like a foot. It seems to do a good job on a very simple thing like fabric.

P2S. The abalone was later cut up into small pieces and offered to all those who attended the ceremony. It was delicious!


original image

square cropping to minimize head

Topaz AI object removal to remove head

Abolone from season opening ceremony on Sugashima 2023 being served