banner
Henry QIU

Henry QIU

Computer Scientist and Game Player
tg_channel
tg_channel
mastodon
jike

AI painting tool + Apple ecosystem = ?

Introduction#

Recently, Apple has released the Core ML Stable Diffusion project, which allows users and developers to run state-of-the-art AI painting models, Stable Diffusion, on native Apple Silicon chips. This article will provide an overview of the technical background, process, and some issues encountered based on personal experience.

image

Background#

The biggest breakthrough in the field of image generation in recent years is the introduction of the Diffusion series models. For specific technical details, please refer to this article.

Originally, the main models in this field were various variants of GAN. However, the high training cost caused by mode collapse and gradient explosion has always been a problem. Even though various methods, such as Lipschitz Constraints, have been applied to alleviate this problem, they still cannot compare with the Diffusion models.

Since Midjourney, the application of diffusion generative models in this field has exploded, and many companies have started to commercialize them. However, for small developers, deploying generative models to product clients is still a challenging task, at least much more difficult than using the ChatGPT interface.

Process#

This project mainly consists of two parts. The first part is to convert the original Torch-based generative models on Hugging Face into Core ML format. The official has provided three pre-compiled versions of Stable Diffusion for trial.

The second part is image generation based on this model and corresponding parameters. Unlike previous work, Apple seems to hope that this can help developers integrate image generation models into app development and provides a version for Swift apps and testing on multiple consumer-grade terminals.

Pitfalls#

Since this is a newly released project, the documentation does not cover all the pitfalls that are easy to encounter. Here are some common issues:

  • Insufficient resources: Consumer-grade terminals (such as my 16GB M1 MacBook Pro) need to pip install accelerate to avoid termination due to insufficient resources.
  • Access restrictions: Even using local models requires using the HuggingFace Token in the terminal huggingface-cli login to avoid not being able to access the server configuration file (I have replied to the corresponding issue page here).
  • Environment configuration: Create a new environment.

Results#

The following image was obtained by entering the following prompt: python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-2-base_original_packages --model-version stabilityai/stable-diffusion-2-base

randomSeed_93_computeUnit_ALL_modelVersion_stabilityai_stable-diffusion-2-base

The result is pretty good, but the efficiency is still significantly different from the data given in the official documentation. It seems that the road to AI inclusiveness is still long and arduous.

When can we make Siri help generate a doctor's certificate for asking the boss for sick leave?

More Information#

https://github.com/apple/ml-stable-diffusion
https://huggingface.co/blog/diffusers-coreml
https://github.com/huggingface/diffusers
https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html

Complete Prompt#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.