Chapter 15: Generating Image Descriptions Using BLIP-2 and LLaVA

Book Overview & Buying
Table Of Contents
Feedback & Rating

Using Stable Diffusion with Python

By : Andrew Zhu (Shudong Zhu)

4.8 (5)

Buy this Book

Using Stable Diffusion with Python

4.8 (5)

By: Andrew Zhu (Shudong Zhu)

Buy this Book

Overview of this book

Stable Diffusion is a game-changing AI tool that enables you to create stunning images with code. The author, a seasoned Microsoft applied data scientist and contributor to the Hugging Face Diffusers library, leverages his 15+ years of experience to help you master Stable Diffusion by understanding the underlying concepts and techniques. You’ll be introduced to Stable Diffusion, grasp the theory behind diffusion models, set up your environment, and generate your first image using diffusers. You'll optimize performance, leverage custom models, and integrate community-shared resources like LoRAs, textual inversion, and ControlNet to enhance your creations. Covering techniques such as face restoration, image upscaling, and image restoration, you’ll focus on unlocking prompt limitations, scheduled prompt parsing, and weighted prompts to create a fully customized and industry-level Stable Diffusion app. This book also looks into real-world applications in medical imaging, remote sensing, and photo enhancement. Finally, you'll gain insights into extracting generation data, ensuring data persistence, and leveraging AI models like BLIP for image description extraction. By the end of this book, you'll be able to use Python to generate and edit images and leverage solutions to build Stable Diffusion apps for your business and users.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Free Chapter

Part 1 – A Whirlwind of Stable Diffusion

Chapter 1: Introducing Stable Diffusion

Evolution of the Diffusion model

Why Stable Diffusion

Which Stable Diffusion to use

Why this book

References

Chapter 2: Setting Up the Environment for Stable Diffusion

Hardware requirements to run Stable Diffusion

Software requirements

Running a Stable Diffusion pipeline

Using Google Colaboratory

Using Google Colab to run a Stable Diffusion pipeline

Summary

References

Chapter 3: Generating Images Using Stable Diffusion

Logging in to Hugging Face

Generating an image

Generation seed

Sampling scheduler

Changing a model

Guidance scale

Summary

References

Chapter 4: Understanding the Theory Behind Diffusion Models

Understanding the image-to-noise process

A more efficient forward diffusion process

The noise-to-image training process

The noise-to-image sampling process

Understanding Classifier Guidance denoising

Summary

References

Chapter 5: Understanding How Stable Diffusion Works

Stable Diffusion in latent space

Generating latent vectors using diffusers

Generating text embeddings using CLIP

Initializing time step embeddings

Initializing the Stable Diffusion UNet

Implementing a text-to-image Stable Diffusion inference pipeline

Implementing a text-guided image-to-image Stable Diffusion inference pipeline

Summary

References

Additional reading

Chapter 6: Using Stable Diffusion Models

Technical requirements

Loading the Diffusers model

Loading model checkpoints from safetensors and ckpt files

Using ckpt and safetensors files with Diffusers

Turning off the model safety checker

Converting the checkpoint model file to the Diffusers format

Using Stable Diffusion XL

Summary

References

Part 2 – Improving Diffusers with Custom Features

Chapter 7: Optimizing Performance and VRAM Usage

Setting the baseline

Optimization solution 1 – using the float16 or bfloat16 data type

Optimization solution 2 – enabling VAE tiling

Optimization solution 3 – enabling Xformers or using PyTorch 2.0

Optimization solution 4 – enabling sequential CPU offload

Optimization solution 5 – enabling model CPU offload

Optimization solution 6 – Token Merging (ToMe)

Summary

References

Chapter 8: Using Community-Shared LoRAs

Technical requirements

How does LoRA work?

Diving into the internal structure of LoRA

Making a function to load LoRA

Why LoRA works

Summary

References

Chapter 9: Using Textual Inversion

Diffusers inference using TI

How TI works

Building a custom TI loader

Summary

References

Chapter 10: Overcoming 77-Token Limitations and Enabling Prompt Weighting

Understanding the 77-token limitation

Overcoming the 77-tokens limitation

Enabling long prompts with weighting

Verifying the work

Overcoming the 77-token limitation using community pipelines

Summary

References

Chapter 11: Image Restore and Super-Resolution

Understanding the terminologies

Upscaling images using Img2img diffusion

ControlNet Tile image upscaling

Summary

References

Chapter 12: Scheduled Prompt Parsing

Technical requirements

Using the Compel package

Building a custom scheduled prompt pipeline

Summary

References

Part 3 – Advanced Topics

Chapter 13: Generating Images with ControlNet

What is ControlNet and how is it different?

Usage of ControlNet

Using multiple ControlNets in one pipeline

How ControlNet works

Further usage

Summary

References

Chapter 14: Generating Video Using Stable Diffusion

Technical requirements

The principles of text-to-video generation

Practical applications of AnimateDiff

Utilizing Motion LoRA to control animation motion

Summary

References

Chapter 15: Generating Image Descriptions Using BLIP-2 and LLaVA

Technical requirements

BLIP-2 – Bootstrapping Language-Image Pre-training

LLaVA – Large Language and Vision Assistant

Summary

References

Chapter 16: Exploring Stable Diffusion XL

What’s new in SDXL?

Using SDXL

Summary

References

Chapter 17: Building Optimized Prompts for Stable Diffusion

What makes a good prompt?

Using LLMs to generate better prompts

Summary

References

Part 4 – Building Stable Diffusion into an Application

Chapter 18: Applications – Object Editing and Style Transferring

Editing images using Stable Diffusion

Object and style transferring

Summary

References

Chapter 19: Generation Data Persistence

Exploring and understanding the PNG file structure

Saving extra text data in a PNG image file

PNG extra data storage limitation

Summary

References

Chapter 20: Creating Interactive User Interfaces

Introducing Gradio

Getting started with Gradio

Gradio fundamentals

Building a Stable Diffusion text-to-image pipeline with Gradio

Summary

References

Chapter 21: Diffusion Model Transfer Learning

Technical requirements

Training a neural network model with PyTorch

Training a model with Hugging Face’s Accelerate

Training a Stable Diffusion V1.5 LoRA

Summary

References

Chapter 22: Exploring Beyond Stable Diffusion

What sets this AI wave apart

The enduring value of mathematics and programming

Staying current with AI innovations

Cultivating responsible, ethical, private, and secure AI

Our evolving relationship with AI

Summary

References

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

4.8 (5)

5 star

80%

4 star

20%

3 star

2 star

1 star

Using Stable Diffusion with Python

By : Andrew Zhu (Shudong Zhu)

Using Stable Diffusion with Python

By: Andrew Zhu (Shudong Zhu)

Overview of this book

Summary

Unlock full access

Continue reading for free

Using Stable Diffusion with Python

By : Andrew Zhu (Shudong Zhu)

Using Stable Diffusion with Python

By: Andrew Zhu (Shudong Zhu)

Overview of this book

Summary

Unlock full access

Continue reading for free

Create a Note

Delete Bookmark

Delete Note

Edit Note

Confirmation

Buy this book with your credits?