Mastering Stable Diffusion: A Practical Guide with DreamStudio and Clipdrop

Fotographer AI, Inc.

Published :

September 20, 2023

With the rise of generative AI, artificial intelligence is attracting more and more attention. Many people are gradually deepening their understanding of generative AI or have actually tried using it.

ChatGPT may be the most talked-about topic, but image generation AI is also causing a stir. Stable Diffusion, which we'll introduce here, is a prime example of image generation AI.

In this article, we'll discuss how to actually use Stable Diffusion.

What is Stable Diffusion?

Stable Diffusion is an image generation AI developed by Stability AI.

Among image generation AIs, it's relatively late to the scene, but it's been called a rising star due to the potential of the model itself and the fact that anyone can use AI to generate images because it's open source.

With its ability to generate high-resolution images and its wealth of expansion features, it's expected to play an even greater role in the future.

For more detailed information, please see this article.

How to Use Stable Diffusion

There are two main ways to use Stable Diffusion:

Using Web-Based Tools

The first way is to use a tool that's available on the web.

The advantage is that you can use it easily without downloading any special software or systems. However, the disadvantages are that you may need to pay if you generate a certain number of images, or you may only be able to use a trial version for free.

It may not be enough for professional designers who want to use Stable Diffusion seriously, but it's a good option for first-time users who want to try it out.

Downloading and Using a Local Version of the Tool on Your PC

The second way is to download and use a local version of the tool on your PC.

Compared to the web-based tool, the advantage is that you can customize it as you like and use it for free without limits.

On the other hand, the disadvantages are that you have to install it on your PC yourself, build an environment based on the publicly available source code, and that a certain level of PC specs is required to generate images efficiently, making the barrier to entry somewhat high.

It's recommended for those who want to use Stable Diffusion seriously.

How to Install Stable Diffusion

From here on, we'll briefly describe how to actually implement Stable Diffusion and the steps involved.

PC Specs Required to Use Stable Diffusion

First, regarding the required specs, there are no particular specifications for the web version. You can generate images simply by entering text (images) on the web.

You can use it without any problems as long as there are no issues such as the speed of your internet connection being extremely slow or your PC's performance being too slow.

On the other hand, the following specs are recommended when using the local version (as of the article's publication date):

  • PC Type: Desktop

  • Memory: 16GB or more (32GB for training)

  • OS: Windows (64bit)

  • CPU: Not particularly important

  • GPU: VRAM 12GB or more

  • Storage: 512GB or more (1TB or more of free space if possible)

As you can see, certain specs are required. Unlike the web version, the PC actually reads the source code and generates images, so if you're considering using the local version, we recommend that you prepare this recommended environment before using it.

Tools That Allow You to Generate Images Using the Web Version of Stable Diffusion

There are several types of tools that can be used on the web version, but this time we'll try using DreamStudio and ClipDrop.

There are also various other tools, so if you're interested in others, please see this article.

3 Steps to Use DreamStudio and Clipdrop, and Simple Operating Instructions

Now, let's explain the steps to use DreamStudio and Clipdrop, and how to operate them.

3 Steps to Use DreamStudio

STEP 1. Click this link

STEP 2. Create an account or, if you have a Google account, click "Continue with Google" to log in

STEP 3. You're done when you reach a screen like the image below

Simple Operating Instructions for DreamStudio

Next, we'll describe the screen for simple operation and the basic method for generating images.

①: You can select either the mode to simply generate images (Generate) or the mode to generate and then edit (Edit).

②: You can select the style of the output image.

③: You can enter a prompt and negative prompt to generate an image, and upload an image.

④: You can adjust the aspect ratio of the image and the number of images to be generated.

Below is an image generated based on the prompt I entered.

What do you think? The prompt I specified was "cool man with glasses in front of building," but I was able to output an image with few inconsistencies.

3 Steps to Use Clipdrop

STEP 1. Click this link

STEP 2. Scroll down a bit and click the "STABLE DIFFUSION XL" button

STEP 3. You're done when you reach a screen like the image below

Simple Operating Instructions for Clipdrop

Simply enter a prompt in the green frame on the screen and press the Generate button to generate an image.

Also, pressing ① displays a menu for setting the style and aspect ratio. You can also set a negative prompt.

This time, in addition to the same prompt as before, I'll add "low quality" as a negative prompt and generate it.

This also generated an image that was close to what I intended. Why not try using negative prompts to generate images that are closer to what you want?

*Negative Prompt: Refers to text data for instructing the image image you do not want to output.

Frequently Heard Words When Generating Images Using Stable Diffusion

At this point, you should be ready to generate images using Stable Diffusion. Here, we will re-introduce some of the words that you will often see when using each tool.

Prompt (Spell)

Text to instruct what kind of image to generate.

It can be said that it is the most basic term when using generative AI.

Negative Prompt (Opposite Spell)

Text for specifying elements that you do not want to be reflected in the generated image.

For example, if you enter "worst quality" in the negative prompt, you can prevent low-quality images from being generated.

CFG (Classifier-Free Guidance) Scale

When generating an image, this is the value that determines how much the prompt is reflected in the image.

The larger the value, the more the image will be generated according to the prompt, but if it is too large, an unnatural image may be generated.

Number of Steps

Refers to the number of noise removal repetitions.

The larger the value, the more specific the image tends to be, but the longer it takes to process, so it takes longer to generate the image.

Also, at the same time, the entire image may collapse, so it is NG if it is too small or too large.

The following content is for intermediate and advanced users.

VAE

VAE is a file to improve the quality of generated images.

It is especially important when you want to generate images with detailed colors and designs, and if you do not have it, the image may be unnatural.

On the other hand, some models have VAE built in in advance, so it is recommended to check in advance as you do not need to download VAE separately.

Merge

Refers to combining models to create a new model, and the combined model is called a merge model.

ControlNet

A technology that allows you to specify the pose of the subject in image generation.

It is a very convenient technology as it can be applied to various things other than poses.

LoRA (Low-Rank Adaptation)

One of the additional learning methods for learning new subjects to existing models (or, it may refer to a model created by this method).

With normal Text To Image, it is difficult to output specific characters, styles, and situations, but if you learn in advance with LoRA, it will be easier to generate the image you want.

It is the mainstream additional learning method because learning can be performed even with relatively low-performance graphics boards, and the output model is lightweight.

Tips for Effectively Using Prompts and Negative Prompts with Stable Diffusion

As a little more practical knowledge, I will explain some tips on using prompts and negative prompts.

Know the Basic Specifications When AI Reads Prompts

Among the many rules, perhaps the most important rule is that prompts entered earlier are processed with higher priority.

Actually, the entered text is not processed in parallel, but the image generation process is performed in the order of the entered prompts.

Therefore, when you have an image you want to generate, it is important to make efforts such as bringing the part that you want to emphasize more to the first half.

Understand the Range of Instructions That Can Be Issued with Prompts

Understanding the range of instructions that can be issued with prompts will also allow you to generate images more efficiently and with higher quality.

With Stable Diffusion, you can specify the following items with prompts/negative prompts.

*This is just one example.

  • Quality

  • Background

  • Basic information such as race, age, facial expression, status, and situation

  • Additional information such as color, time, how light hits, various body parts, hairstyle, and clothing

  • Race, age

  • Style

  • Composition

When You Want to Output More Accurate Images with Stable Diffusion

At this point, you should be able to generate images above a certain level, but I will introduce some tips for those who want to output even more accurate images.

Try Loading Various Prompts and Images Until Your Ideal Image Is Output

The first is to repeat the trial process and learn.

Even if it is the same word, the image generated will differ depending on the order, as described above.

Because it is a creative image, I think the feeling that each person feels is also very important, so I recommend that you try it until the image you imagine is generated.

Utilize Technologies Such as ControlNet and LoRA

The second is to try utilizing technologies such as ControlNet and LoRA.

The background to Stable Diffusion's high reputation in the world is precisely these expansion functions and the richness of what it can do.

Especially with LoRA, learning can be performed with relatively low-performance graphics boards, and the output model is very lightweight, so I would like you to try it.

Summary

In this article, I actually generated images using DreamStudio and Clipdrop provided by Stable Diffusion, what did you think?

The content introduced this time is only a part of what Stable Diffusion can do, and I think that it will be able to do even more in the future. I hope this article will be helpful for those who are considering using Stable Diffusion.

Design your Dreams, Magically.

An AI image synthesis tool that anyone can intuitively use in the browser.