The difficulty of acquiring high-resolution (HR) and low-resolution (LR) image pairs in real scenarios limits the performance of existing learning-based image super-resolution (SR) methods in the real world. To conduct training on real-world unpaired data, current methods focus on synthesizing pseudo LR images to associate unpaired images. However, the realness and diversity of pseudo LR images are vulnerable due to the large image space. In this paper, we propose an alternative to build the connection between unpaired images in a compact proxy space without relying on synthesizing pseudo LR images. Specifically, we first construct coupled HR and LR dictionaries, and then encode HR and LR images into a common latent code space using these dictionaries. In addition, we develop an autoencoder-based framework to couple these dictionaries during optimization by reconstructing input HR and LR images. The coupled dictionaries enable our method to employ a shallow network architecture with only 18 layers to achieve efficient image SR. Extensive experiments show that our method (DictSR) can effectively model the LR-to-HR mapping in coupled dictionaries and produces state-of-the-art performance on benchmark datasets.