nowadays, AI characters are everywhere, e.g. we see them as influencers on Youtube or as movie actors … and a couple of days ago, a manufacturer/vendor was kicked off from TDF because they somehow AI tampered their promotional images. So I iguess it's time to take a look how this technology works, what can be accomplished with it, and especially how it can be detected. In a nutshell, acquiring some media competence, like identifying image manipulation with classic methods like retouching or detecting fake journalism. Things everybody should be attentive these deys, at least to some degree, unless he/she likes to get manipulated (respectively mindfucked).
I guess I am not the only one curious about AI imaging, so please feel free to share your own thougths and experiences or your (own) AI generated images, as long as they are doll/beauty/modeling related and comply with the TDF RoC and ToS.
Since this isn't going to be a tutorial, let's just start with a (more or less random) showcase image to illustrate what AI imagery is:
The (beautiful) picture above was created by 'paradox7525' and is used in the showcase section for Midjourney, one of many publicly available AI image generators. This image comes with the description:
The above description is also a so called 'prompt', meaning that - in theory - this set of terms should suffice to generate this or a similar image.a beautiful woman draped in silks and floating surface of water, art nouveau, in the style of Alphonse Mucha
So theoretically, we can instruct an AI image generator to create images of humans, other animals, things, landscapes, objects and the like by just verbally describing them. Also, the AI image generator can mimic styles, from cinematic or photorealistic over anime/manga style, cartoon style to artistic styles. For that the AI uses a data set called the 'model', and the model is created with training. The AI can not generate an image of an penguin if it was never trained with the image of a penguin.
For me that was a bit confusing as in my understanding, a real intelligence should be able to deduce things from abstract descriptions. E.g. by looking up in a reference book where it can read that a penguin is a bird, so it might have feathers and not scales like a fish or white dots like a fly agaric. When I tried my first AI promt, I learnt what this restriction meant:
This image was generated with A1111, an open source blend of Stable Diffusion. The prompt I used was simple:
And yes, I also used a couple of negative promt parameters like "morbid, ugly, asymmetrical, mutated malformed, mutilated". The 'negative promt' lists things we do not want to see in our generated image.Family of 10 different nude sex dolls in the living room
Now about the picture above - where to start… this particular AI had trouble to count. It's definitely more than 10 soll dolls. The AI believes for some reason that dolls must have detachable limbs as most dolls seem to habe detachable arms. Not all dolls are nude, and there are numerous errors included which definitely do not honor the negative prompt paramters. There is nothing to see which looks like a living room. And, obviously, the image is unusable garbage.
Why is the AI struggling so much witch such a simple prompt? My guess is:
- There are components in the prompt most AI's are instructed not to process ("nude"); and
- the training set of the model might not have included any actual sex doll.
For my second attempt I used another AI image generator called Fooocus. Same prompt, but a much more eloborate set of negative prompt paramters. Two images were generated in this batch:
By the way, all these images are unedited, unretouched and unaltered (except for the "AI generated image" reminder).
Here is the complete negative prompt for the two images above:
This is much better, even though the AI is still unable to count to 10. The dolls have still detachable arms like manikins, and the second image has an surplus arm (2nd doll from the left) and two detachable hands (both dolls on the right side). But we got a living room and the dolls look kind of OK, so these images could be a starting point for further work (assuming someone is interested in doll photography but does not have any dolls ).(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)
So from these very basic examples we already learnt some things to watch out for: Image segments that do not look right, elements that appear distorted, and last but not least: surplus limbs. If you watch careful, you will see such manipulations everywhere in the (legacy) media. For example, the German mainstream media recently managed to publish a picture of a protesting crowd and placed a part of the crowd within a river (in this case, the Alster in Hamburg, Germany), attempting to make the crowd appear larger than it actually was. Or watch out for hands with six fingers. This was an infamous bug of some AI image generators last summer, and it showed up in countless "evicence" pictures from the Israel/Gaza conflict.
Another thing I am personally struggling with is repeatability. The AI image generators have a lot of artistic freedom for their creations. In some AI generators the amount of "artistic license" is configurable, e.g. with the CFG setting in Stable Diffusion: The CFG scale (classifier-free guidance scale) or guidance scale is "a parameter that controls how much the image generation process follows the text prompt. The higher the value, the more the image sticks to a given text input". Or in other words: It seems to be very hard to exactly replicate one generated image. To give an example: For the batch with the following two images I used exactly the prompt from the initial showcase image:
I fed this to Fooocus with the negative prompt from above, and that's what I got:a beautiful woman draped in silks and floating surface of water, art nouveau, in the style of Alphonse Mucha
Both are very nice, but also very different from the initial showcase picture. Because of this I am asking myself: Does anyone really know what's going on in an AI brain if we hardly can predict the outcome of a task it is supposed to solve? If we do not know how the AI comes to its results, how do we know that its internal logic isn't flawed?
Sandro