介绍#
近日,苹果开放了Core ML Stable Diffusion项目,这个项目可以让用户和开发者基于原生 Apple Silicon 芯片运行最先进的 AI 绘画模型 Stable Diffusion。这篇文章会基于个人使用经历介绍大致的技术背景,流程,和遇到的一些问题。
背景#
图像生成领域近年最大的突破就是 Diffusion 系列模型的推出,具体技术细节可以看这篇文章。
原本这个领域的主要模型是GAN的多个变种,但是模式崩溃和梯度爆炸导致的高训练成本问题一直存在,即使多种方法,例如 Lipschitz Constraints 等被应用于缓释这个问题,但依然无法与 Diffusion 模型相比。
从 Midjourney 开始,扩散生成模型在这个领域的应用呈爆炸式增长,很多公司也开始将其商用化。但对于小型开发者而言,将生成模型部署到产品客户端依然是一件颇有难度的事情,至少比套皮 ChatGPT 接口要难多了。
流程#
该项目主要包括了两个部分,第一部分是将Hugging Face上原本基于 Torch 的生成模型转编译为 Core ML 格式,官方已经提供了编译好的三个版本的 Stable Diffusion 试用。
第二部分是基于此模型和对应参数进行图像的生成。与先前的工作不同,Apple 似乎希望这能帮助开发者将图像生成模型集成到 App 开发,并提供了 Swift App 的版本与多个消费级终端上的测试。
踩坑#
因为是新发布的项目,说明文档并没有涵盖所有容易踩到的坑,以下是一些容易遇到的问题:
- 资源不足:消费级终端(例如我的 16GB M1 Macbook Pro)需要
pip install accelerate
避免因资源不足终止运行。 - 访问限制:直接使用本地模型也需要使用 HuggingFace Token 在终端
huggingface-cli login
避免访问不到服务端配置文件(该问题我已于相应 issue 页面回复)。 - 环境配置:建个新环境吧。
结果#
下图由以下 Prompt 输入得到python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
结果还不错,但是运行效率和官方文档给出的数据相比还是有较大的差距,看来 AI 普惠之路依然任重而道远。
我们什么时候才可以让 Siri 帮忙生成一张用来和老板请病假的医生证明呢?
更多信息#
https://github.com/apple/ml-stable-diffusion
https://huggingface.co/blog/diffusers-coreml
https://github.com/huggingface/diffusers
https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html
完整 Prompt#
(base) henry@HenrydeMacBook-Pro ml-stable-diffusion % python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output.png --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 13 files: 100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 63402.27it/s]
/Users/henry/opt/anaconda3/lib/python3.9/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
WARNING:__main__:Original diffusers pipeline for stabilityai/stable-diffusion-2-base does not have a safety_checker, Core ML pipeline will mirror this behavior.
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from models/coreml-stable-diffusion-2-base_original_packages
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 19.5 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 139.2 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 6.2 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
WARNING:__main__:You have disabled the safety checker for <class '__main__.CoreMLStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|████████████████████████████████████████████████████████████████| 51/51 [00:22<00:00, 2.25it/s]
INFO:__main__:Saving generated image to output.png/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png