File:X-Y plot of algorithmically-generated photorealistic portraits by nationality.png

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Original file(9,216 × 4,642 pixels, file size: 33.4 MB, MIME type: image/png)

Captions

Captions

Add a one-line explanation of what this file represents

Summary[edit]

Description

An X/Y plot of algorithmically-generated photorealistic AI images featuring an office worker depicted as various different nationalities, created using a custom merged Stable Diffusion AI diffusion model checkpoint featuring R34_e2 merged with gg1342 at 0.5 weighted sum, then merged with Anything V3.0 at 0.5 weighted sum, and then finally merged with F222 at 0.5 weighted sum. This merged model was also paired with the sd-vae-ft-mse-original VAE. This plot serves to illustrate the most basic use-case for the img2img feature within Stable Diffusion.

Procedure/Methodology

These images were generated using an NVIDIA RTX 4090; since Ada Lovelace chipsets (using compute capability 8.9, which requires CUDA 11.8) are not fully supported by the pyTorch dependency libraries currently used by Stable Diffusion, I've used a custom build of xformers, along with pyTorch cu116 and cuDNN v8.6, as a temporary workaround. Front-end used for the entire generation process is Stable Diffusion web UI created by AUTOMATIC1111.

An initial 768x1024 image was generated with txt2img using the following prompts:

Prompt: young professional Chinese woman wearing white office shirt and dark navy skirt, long black hair, Nikon D7500, 4K, sharp focus, photorealistic high-quality, volumetric lighting, close-up

Negative prompt: (((out of frame))), (((no face))), ((deformed hands)), extra limbs, ((ugly)), (((deformed))), ((bad anatomy)), ((mangled)), (((censored))), (blurry), (((distorted face))), mutation, amputee, hugging, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, broken body, mutated, extra hands, extra feet, extra arms, extra legs, multiple views, lowres

Settings: Steps: 100, Sampler: DPM2 a, CFG scale: 12, Size: 768x1024, Highres. fix, Denoising strength: 0.7

Then, a batch of 1536x2048 images were generating with img2img, using the image generated earlier, along with the following prompts:

Prompt: young professional Japanese woman wearing white office shirt and dark navy skirt, long black hair, Nikon D7500, 4K, sharp focus, photorealistic high-quality, volumetric lighting, close-up

Negative prompt: (((out of frame))), (((no face))), ((deformed hands)), extra limbs, ((ugly)), (((deformed))), ((bad anatomy)), ((mangled)), (((censored))), (blurry), (((distorted face))), mutation, amputee, hugging, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, broken body, mutated, extra hands, extra feet, extra arms, extra legs, multiple views, lowres

Settings: Steps: 100, Sampler: DPM2 a, CFG scale: 7, Size: 1536x2048, Denoising strength: 0.5, Mask blur: 4

During the generation of this batch, an X/Y plot was generated using the "X/Y plot" script, along with the following settings:

  • X-axis: Prompt S/R: Japanese, Korean, Thai, Russian, Swedish, French, Italian, Arab, American, African American, Somali
This script searches for the first value (in this case "Japanese") within the prompt, and replaces the string with the subsequent comma-separated values. The original txt2img image (with the label "Chinese") was upscaled using SwinIR_4x at 0.1 denoising strength, and then added in-post using GIMP.
Date
Source Own work
Author Benlisquare
Permission
(Reusing this file)
Output images

As the creator of the output images, I release this image under the licence displayed within the template below.

Stable Diffusion AI model

The Stable Diffusion AI model is released under the CreativeML OpenRAIL-M License, which "does not impose any restrictions on reuse, distribution, commercialization, adaptation" as long as the model is not being intentionally used to cause harm to individuals, for instance, to deliberately mislead or deceive, and the authors of the AI models claim no rights over any image outputs generated, as stipulated by the license.

Merged models

R34_e2, gg1342 and F222 are custom-trained derivative models of Stable Diffusion 1.4. The CreativeML OpenRAIL-M License applies to all downstream derivative versions of the model, as stipulated under the preamble. Anything V3.0, created by Furqanil Taqwa, is released under the CreativeML OpenRAIL-M License.

Personality rights
All individuals depicted are 100% fictional entities generated by the AI diffusion model, and do not exist in real life.

Licensing[edit]

I, the copyright holder of this work, hereby publish it under the following licenses:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
GNU head Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.
You may select the license of your choice.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current22:12, 3 December 2022Thumbnail for version as of 22:12, 3 December 20229,216 × 4,642 (33.4 MB)Benlisquare (talk | contribs)separate into two rows, for better ease of viewing
22:08, 3 December 2022Thumbnail for version as of 22:08, 3 December 202218,432 × 2,321 (35.44 MB)Benlisquare (talk | contribs){{Information |Description=An X/Y plot of algorithmically-generated photorealistic AI images featuring an office worker depicted as various different nationalities, created using a custom merged Stable Diffusion AI diffusion model checkpoint featuring R34_e2 merged with gg1342 at 0.5 weighted sum, then merged with [https://huggingface.co/Linaqruf/anything-v3.0 Anything V3.0] at 0.5 weighted sum, and then finally merged with [https://ai.zeipher.com/ F222] at 0.5 weight...

There are no pages that use this file.

Metadata