Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, that's actually the biggest reason this is such a cool announcement! You just need to download the model checkpoints from HuggingFace[0] and follow the instructions on their Github repo[1] and and you should be good to go. You basically just need to clone the repo, set up a conda environment, and make the weights available to the scripts they provide.

[0] https://huggingface.co/CompVis/stable-diffusion [1] https://github.com/CompVis/stable-diffusion

Good luck!



What's the difference between those 4 checkpoints?

From the GitHub's README:

    sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).

    sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).

    sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

    sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Which one is the general use case checkpoint one should be using?


You need a decent GPU, though. I suspect my 6080MiB won't cut it any longer :(


There's a version that's a bit slower but more memory efficient https://github.com/basujindal/stable-diffusion that runs on 6GB too.


Is Apple M1 support expected soon? Because even if Apple’s chips are slower, they have plenty of RAM on laptops. I saw some weeks ago it was coming, but I am not sure where to follow the process.


looks like there is a nightly release for apple silicon, https://towardsdatascience.com/gpu-acceleration-comes-to-pyt...


You're going to need at least 10GB VRAM. My SFF pc with 4GB VRAM can only run dalle mini / craiyon :(


Not if you change the precision to float16. Should work on a smaller card. Tried on a 1080 with 8GB and it works well.


How would one do that?

-----

Sorry my bad, found the answer. One simply adds the following flags to the StableDiffusionPipeline.from_pretrained call in the example: revision="fp16", torch_dtype=torch.float16

Found it in this blogpost: https://huggingface.co/blog/stable_diffusion

mempko thank you for your hint! I was about to drop a not insignificant amount of money on a new GPU.

What does one lose by using float16 representation? Does it make the images visually less detailed? Or how can one reason about this?


Zero loss. All upside. Only causes issues when training. 32-bit ships by default because it is compatible with cpu and GPU’s that might not have native fp16 support.

Edit: Just to be clear, your intuition that it could cause issues is certainly merited - and not _all_ models can be trivially converted from fp32 to fp16 without some new error accumulating (during inference). Variational autoencoders like VQGAN and GAN's are particularly prone to such issues.

But in this case, it's all upside.


Can you please tell me where is the model.ckpt? I am not able to find any weight with ".ckpt" format there in the both links that you have given. There are the ".bin" file on the hugging face.


On the huggingface site, click the checkbox and then access the repository. The chkpt file is under the "files and version" tab.


For anyone else reading you need the -original versions. The others are setup for the diffusers library and I can't find a checkpoint file in that, just the original one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: