介紹#
近日,蘋果開放了Core ML Stable Diffusion項目,這個項目可以讓用戶和開發者基於原生 Apple Silicon 芯片運行最先進的 AI 繪畫模型 Stable Diffusion。這篇文章會基於個人使用經歷介紹大致的技術背景,流程,和遇到的一些問題。
背景#
圖像生成領域近年最大的突破就是 Diffusion 系列模型的推出,具體技術細節可以看這篇文章。
原本這個領域的主要模型是GAN的多個變種,但是模式崩潰和梯度爆炸導致的高訓練成本問題一直存在,即使多種方法,例如 Lipschitz Constraints 等被應用於緩解這個問題,但依然無法與 Diffusion 模型相比。
從 Midjourney 開始,擴散生成模型在這個領域的應用呈爆炸式增長,很多公司也開始將其商用化。但對於小型開發者而言,將生成模型部署到產品客戶端依然是一件頗有難度的事情,至少比套皮 ChatGPT 接口要難多了。
流程#
該項目主要包括了兩個部分,第一部分是將Hugging Face上原本基於 Torch 的生成模型轉編譯為 Core ML 格式,官方已經提供了編譯好的三個版本的 Stable Diffusion 試用。
第二部分是基於此模型和對應參數進行圖像的生成。與先前的工作不同,Apple 似乎希望這能幫助開發者將圖像生成模型集成到 App 開發,並提供了 Swift App 的版本與多個消費級終端上的測試。
踩坑#
因為是新發布的項目,說明文檔並沒有涵蓋所有容易踩到的坑,以下是一些容易遇到的問題:
- 資源不足:消費級終端(例如我的 16GB M1 Macbook Pro)需要
pip install accelerate
避免因資源不足終止運行。 - 訪問限制:直接使用本地模型也需要使用 HuggingFace Token 在終端
huggingface-cli login
避免訪問不到服務端配置文件(該問題我已於相應 issue 頁面回覆)。 - 環境配置:建個新環境吧。
結果#
下圖由以下 Prompt 輸入得到python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
結果還不錯,但是運行效率和官方文檔給出的數據相比還是有較大的差距,看来 AI 普惠之路依然任重而道遠。
我們什麼時候才可以讓 Siri 幫忙生成一張用來和老闆請病假的醫生證明呢?
更多信息#
https://github.com/apple/ml-stable-diffusion
https://huggingface.co/blog/diffusers-coreml
https://github.com/huggingface/diffusers
https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html
完整 Prompt#
(base) henry@HenrydeMacBook-Pro ml-stable-diffusion % python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output.png --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 13 files: 100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 63402.27it/s]
/Users/henry/opt/anaconda3/lib/python3.9/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
WARNING:__main__:Original diffusers pipeline for stabilityai/stable-diffusion-2-base does not have a safety_checker, Core ML pipeline will mirror this behavior.
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from models/coreml-stable-diffusion-2-base_original_packages
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 19.5 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 139.2 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 6.2 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
WARNING:__main__:You have disabled the safety checker for <class '__main__.CoreMLStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|████████████████████████████████████████████████████████████████| 51/51 [00:22<00:00, 2.25it/s]
INFO:__main__:Saving generated image to output.png/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png