When I first saw ImageNet Roulette, I didn’t know what it was – only that an artist I follow on Twitter had posted an image of his face framed by a thin neon-green square, accompanied, in the top-left corner, by the words: ‘swot, grind, nerd, wonk, dweeb’. He then posted another photo, in which he had taken off his glasses; this time, the text in the frame read: ‘rape suspect’. And again, after having rearranged his hair, smiling: ‘first offender’.
Scrolling down, I started seeing those same green frames everywhere, tagged with words like ‘beard’, ‘mezzo-soprano, mezzo’, ‘cog’ and ‘weirdo’. It felt familiar: I’ve seen these kinds of tags before. Generated by ImageNet Roulette, a project by researcher Kate Crawford and artist Trevor Paglen, the labels draw on AI technology to replicate how machine learning systems analyze images. Using an open-source, deep-learning frame ImageNet Roulette matched uploaded images with categories that already exist in a training set called ImageNet. Such visual-recognition tests comprise some of the very first AI assignments, meaning there are numerous datasets of this kind. In their essay about the project, ‘Excavating AI: The Politics of Images in Machine Learning Training Sets’ (2019), Crawford and Paglen observe that they chose ImageNet – which was devised by computer scientists at Princeton and Stanford Universities between 2006 and 2009 – because it is the ‘canonical’ training set.
ImageNet Roulette allows users to upload a selfie to the site, where it is analyzed as it would be by AI. Although it does have a ‘people’ category, ImageNet is actually an object-oriented dataset that was never intended for facial-recognition training, and its tags for humans can be quite disturbing. ImageNet categorizes a human as ‘person, individual, someone, somebody, mortal, soul’ and then assigns to them one of thousands of labels, ranging from ‘journalist’ to ‘rape suspect’.
Crawford uploaded to Twitter a photo of herself and Paglen, both wearing suits and looking very professional, in which she was tagged ‘newsreader, news reader’ and Paglen ‘microeconomist, microeconomic expert’. I tried it twice: initially, I was designated ‘nondriver’; then – following a slight change of scenery and position – ‘psycholinguist’, which appears to be a very common label. Another very common tag? ‘Face’. Some women posted on social media that they were flattered when ImageNet recognized them as ‘temptress’, ‘femme fatale’, ‘enchantress’ or ‘siren’. (Influencer Laura Lux uploaded a selfie with the comment ‘is ImageNet Roulette tryna fuck’ when her – surprisingly, not risqué – selfie got tagged ‘smasher, stunner, knockout, beauty, ravisher’.) One Twitter user uploaded the album cover of The Beatles’ Sgt. Pepper’s Lonely Hearts Club Band (1967), claiming that it ‘continues to be a great test of face neural networks’. ImageNet reads John, Paul and George (Ringo wasn’t detected) as ‘Fauve, fauvist’; there’s also a ‘cog’ and a ‘grinner’; someone in the corner is labelled ‘eager beaver, busy bee, live wire’; next to him is ‘policyholder’ and, close by, yet another ‘psycholinguist’.
The popularity of ImageNet Roulette on social media echoes that of FaceApp, which allowed users to change how old they looked in uploaded photos, and the Google Arts and Culture app, which paired users’ selfies with historical works of art. In the past couple of years, both these apps went viral, with users sharing the results across multiple social-media platforms. ImageNet Roulette’s aesthetic, with the distinct green frame and text, makes it even more appealing for sharing because it’s so instantly recognizable. Yet, while the three platforms operate similarly, they differ in intention. FaceApp and the Google Arts and Culture app collect data on users and are two worrisome examples of how viral participation distracts people who may otherwise be very conscious of their right to privacy. ImageNet Roulette does exactly the opposite: it draws users’ attention to systemic problems, in this case, the bias of a data set, which is exactly the kind of partiality that was exemplified by the Google Arts and Culture app, which has been widely criticized for racism, due to the inherent Eurocentrism of its data set that draws largely from Western museums, in which representations of people of colour are rare and, when they do appear, are largely historical documents of exploitation.
The language of ImageNet Roulette’s tags is so weird, and so removed from our experience (‘someone, somebody, mortal, soul’), that it’s hard to resist the impulse to share its results because they’re so alienated from reality and thus funny. (Or, like Alex Goldman, co-host of technology podcast Reply All, simply to have someone explain what the label ‘grass widower’ actually means.) The project reaffirms our notion – our hope – that machine thinking is still clunky, incapable. That computers are not superintelligent, uncontrollable beings; they may have billions of data points but they still can’t figure out language. We rejoice in the weirdness because, in the very moment that the machine strips us of our humanity by reading us as part of a dataset, it simultaneously reaffirms our humanity, since where it doesn’t see how its language is unresolved, we do – and we can laugh about it.
ImageNet Roulette reveals another issue with AI: judgement. In 2016, a group of scientists at Shanghai Jiao Tong University in China taught AI to recognize criminals by creating a training set with 1,000 faces of non-criminals and around 800 photographs of convicted criminals. The scientists discussed micro-expressions and countered those who claimed the idea was close to phrenology by observing that machines are neutral. Yet, as exemplified by Crawford and Paglen’s project – which forms part of their exhibition ‘Training Humans’ at Fondazione Prada in Milan – machines are only as unbiased as the training sets they are given to work with. Some of the ImageNet tags are so blatantly racist, misogynist, and hateful, that they are genuinely shocking, even in 2019. The telling-it-like-it-is mentality of the project feels offensive, but Crawford and Paglen explain in ‘Excavating AI’ that to see the foundation on which AI systems are trained is a ‘forensic method to understand how they work’ and continue – ‘this has serious consequences’.
Users’ delighting in the weirdness of being seen by a machine does not take away from their understanding of the ramifications of this vision. In a few days, or weeks, the internet will have forgotten about ImageNet Roulette, just as no one posts images from FaceApp anymore. Digital culture moves fast. Another, similar app will go viral soon, since users – humans, that is – are always curious about how they are seen. In an essay for the catalogue that accompanied ‘Astro Noise’, Laura Poitras’s 2016 exhibition at the Whitney Museum of American Art in New York, Crawford begins with a reference to the Oracle of the Temple of Apollo at Delphi, on which, legend has it, were once inscribed the words: ‘Know thyself.’
Main image: courtesy: ImageNet Roulette