Yes, that's actually the biggest reason this is such a cool announcement! You just need to download the model checkpoints from HuggingFace[0] and follow the instructions on their Github repo[1] and and you should be good to go. You basically just need to clone the repo, set up a conda environment, and make the weights available to the scripts they provide.
What's the difference between those 4 checkpoints?
From the GitHub's README:
sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Which one is the general use case checkpoint one should be using?
Is Apple M1 support expected soon? Because even if Apple’s chips are slower, they have plenty of RAM on laptops. I saw some weeks ago it was coming, but I am not sure where to follow the process.
Sorry my bad, found the answer. One simply adds the following flags to the StableDiffusionPipeline.from_pretrained call in the example: revision="fp16", torch_dtype=torch.float16
Zero loss. All upside. Only causes issues when training. 32-bit ships by default because it is compatible with cpu and GPU’s that might not have native fp16 support.
Edit: Just to be clear, your intuition that it could cause issues is certainly merited - and not _all_ models can be trivially converted from fp32 to fp16 without some new error accumulating (during inference). Variational autoencoders like VQGAN and GAN's are particularly prone to such issues.
Can you please tell me where is the model.ckpt? I am not able to find any weight with ".ckpt" format there in the both links that you have given. There are the ".bin" file on the hugging face.
For anyone else reading you need the -original versions. The others are setup for the diffusers library and I can't find a checkpoint file in that, just the original one.
[0] https://huggingface.co/CompVis/stable-diffusion [1] https://github.com/CompVis/stable-diffusion
Good luck!