Significant advancements have been made in image editing with the recent advance of the Diffusion model. However, most of the current methods primarily focus on global or subject-level modifications, and often face limitations when it comes to editing specific objects when there are other objects coexisting in the scene, given solely textual prompts. In response to this challenge, we introduce an object-level generative task called Referring Image Editing (RIE), which enables the identification and editing of specific source objects in an image using text prompts. To tackle this task effectively, we propose a tailored framework called ReferDiffusion. It aims to disentangle input prompts into multiple embeddings and employs a mixed-supervised multi-stage training strategy. To facilitate further research in this domain, we introduce the RefCOCO-Edit dataset, comprising images, editing prompts, source object segmentation masks, and reference edited images for training and evaluation. Our extensive experiments demonstrate the effectiveness of our approach in identifying and editing target objects, while conventional general image editing and region-based image editing methods have difficulties in this challenging task.