AI painting tool + Apple ecosystem = ?

Introduction#

Recently, Apple has released the Core ML Stable Diffusion project, which allows users and developers to run state-of-the-art AI painting models, Stable Diffusion, on native Apple Silicon chips. This article will provide an overview of the technical background, process, and some issues encountered based on personal experience.

Background#

The biggest breakthrough in the field of image generation in recent years is the introduction of the Diffusion series models. For specific technical details, please refer to this article.

Originally, the main models in this field were various variants of GAN. However, the high training cost caused by mode collapse and gradient explosion has always been a problem. Even though various methods, such as Lipschitz Constraints, have been applied to alleviate this problem, they still cannot compare with the Diffusion models.

Since Midjourney, the application of diffusion generative models in this field has exploded, and many companies have started to commercialize them. However, for small developers, deploying generative models to product clients is still a challenging task, at least much more difficult than using the ChatGPT interface.

Process#

This project mainly consists of two parts. The first part is to convert the original Torch-based generative models on Hugging Face into Core ML format. The official has provided three pre-compiled versions of Stable Diffusion for trial.

The second part is image generation based on this model and corresponding parameters. Unlike previous work, Apple seems to hope that this can help developers integrate image generation models into app development and provides a version for Swift apps and testing on multiple consumer-grade terminals.

Pitfalls#

Since this is a newly released project, the documentation does not cover all the pitfalls that are easy to encounter. Here are some common issues:

Insufficient resources: Consumer-grade terminals (such as my 16GB M1 MacBook Pro) need to pip install accelerate to avoid termination due to insufficient resources.
Access restrictions: Even using local models requires using the HuggingFace Token in the terminal huggingface-cli login to avoid not being able to access the server configuration file (I have replied to the corresponding issue page here).
Environment configuration: Create a new environment.

Results#

The following image was obtained by entering the following prompt: python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base

randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base

The result is pretty good, but the efficiency is still significantly different from the data given in the official documentation. It seems that the road to AI inclusiveness is still long and arduous.

~~When can we make Siri help generate a doctor's certificate for asking the boss for sick leave?~~

More Information#

https://github.com/apple/ml-stable-diffusion
https://huggingface.co/blog/diffusers-coreml
https://github.com/huggingface/diffusers
https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html

Complete Prompt#

(base) henry@HenrydeMacBook-Pro ml-stable-diffusion % python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output.png --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base
INFO:__main__:Setting random seed to 93
INFO:__main__:Initializing PyTorch pipe for reference configuration
Fetching 13 files: 100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 63402.27it/s]
/Users/henry/opt/anaconda3/lib/python3.9/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
WARNING:__main__:Original diffusers pipeline for stabilityai/stable-diffusion-2-base does not have a safety_checker, Core ML pipeline will mirror this behavior.
INFO:__main__:Removed PyTorch pipe to reduce peak memory consumption
INFO:__main__:Loading Core ML models in memory from models/coreml-stable-diffusion-2-base_original_packages
INFO:python_coreml_stable_diffusion.coreml_model:Loading text_encoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_text_encoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 19.5 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading unet mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_unet.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 139.2 seconds.
INFO:python_coreml_stable_diffusion.coreml_model:Loading a CoreML model through coremltools triggers compilation every time. The Swift package we provide uses precompiled Core ML models (.mlmodelc) to avoid compile-on-load.
INFO:python_coreml_stable_diffusion.coreml_model:Loading vae_decoder mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Loading models/coreml-stable-diffusion-2-base_original_packages/Stable_Diffusion_version_stabilityai_stable-diffusion-2-base_vae_decoder.mlpackage
INFO:python_coreml_stable_diffusion.coreml_model:Done. Took 6.2 seconds.
INFO:__main__:Done.
INFO:__main__:Initializing Core ML pipe for image generation
WARNING:__main__:You have disabled the safety checker for <class '__main__.CoreMLStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
INFO:__main__:Stable Diffusion configured to generate 512x512 images
INFO:__main__:Done.
INFO:__main__:Beginning image generation.
100%|████████████████████████████████████████████████████████████████| 51/51 [00:22<00:00,  2.25it/s]
INFO:__main__:Saving generated image to output.png/a_photo_of_an_astronaut_riding_a_horse_on_mars/randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base.png