Poster
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel · Elia Peruzzo · Yifan Jiang · Dejia Xu · Xingqian Xu · Nicu Sebe · Trevor Darrell · Zhangyang Wang · Humphrey Shi
Arch 4A-E Poster #369
Generative image editing has recently witnessed extremely fast-paced growth.Some works use high-level conditioning such as text, while others use low-levelconditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we tackle the task by perceiving the images as an amalgamation ofvarious objects and aim to control the properties of each object in a fine-grainedmanner. Out of these properties, we identify structure and appearance as the mostintuitive to understand and useful for editing purposes. We propose PAIR Diffusion, a generic framework that can enable a diffusion model to control the structure and appearance properties of each object in the image. We show that having controlover the properties of each object in an image leads to comprehensive editingcapabilities. Our framework allows for various object-level editing operations onreal images such as reference image-based appearance editing, free-form shapeediting, adding objects, and variations. Thanks to our design, we do not requireany inversion step. Additionally, we propose multimodal classifier-free guidancewhich enables editing images using both reference images and text when usingour approach with foundational diffusion models. We validate the above claimsby extensively evaluating our framework on both unconditional and foundationaldiffusion models.