Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

CVPR 2024 |

Diffusion models have demonstrated unprecedented capabilities in image generation. Yet, they incorporate and amplify the data bias (e.g., gender, age) from the original training set, limiting the diversity of generated images. In this paper, we propose a diversity-oriented fine-tuning method using reinforcement learning (RL) for diffusion models under the guidance of an image-set-based reward function. Specifically, the proposed reward function, denoted as Diversity Reward, utilizes a set of generated images to evaluate the coverage of the current generative distribution w.r.t. the reference distribution, represented by a set of unbiased images. Built on top of the probabilistic method of distribution discrepancy estimation, Diversity Reward can measure the relative distribution gap with a small set of images efficiently. We further formulate the diffusion process as a multi-step decision-making problem (MDP) and apply policy gradient methods to fine-tune diffusion models by maximizing the Diversity Reward. The proposed rewards are validated on a post-sampling selection task, where a subset of the most diverse images are selected based on Diversity Reward values. We also show the effectiveness of our RL fine-tuning framework on enhancing the diversity of image generation with different types of diffusion models, including class-conditional models and text-conditional models, e.g., StableDiffusion.