The TL;DR
- Recent advances in Generative AI have led to the launch of a whole host of services such as DALL-E 2, Midjourney and Stability AI that have the potential to drastically change the way we approach content creation.
- In this post I show you how to build and serve your very own, high performance, text-to-image service over an API. Based on Stable Diffusion via HuggingFace, using Vertex AI Workbench and Endpoints.
How we got here
As George Lawton mentions in his article: “Generative AI is a type of artificial intelligence technology that can produce various types of content including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.”[2]
Machine Learning is nothing new, in fact it’s been around in some shape or form since the 1960s[1]. “But it was not until 2014, with the introduction of generative adversarial networks (GANs), a type of machine learning algorithm, that generative AI could create convincingly authentic images, videos and audio of real people.”[2]
Combined with the power of Large Language Models (LLMs) that can take a user prompt in natural language describing something and then produce photorealistic images, we’ve come a very long way in a short period of time. The first to do this was OpenAI’s DALL·E, in April 2022, followed by Disco Diffusion in August 2022, which was eventually succeeded by Stable Diffusion.