Recent advances in generative diffusion models have enabled the previously unfeasible capability of generating 3D assets from a single input image or a text prompt. In this work, we aim to enhance the quality and functionality of these models for the task of creating controllable, photorealistic human avatars. We achieve this by integrating a 3D morphable model into the state-of-the-art multiview-consistent diffusion approach. First, we demonstrate that proper conditioning of a generative pipeline with the articulated 3D model enhances the baseline model performance on the task of novel view synthesis from a single image. Next, we introduce a training regime that enables the animation of the reconstructed model with new facial expressions and body poses. To the best of our knowledge, our proposed model is the first to enable the creation of a 3D-consistent, animatable and photorealistic human avatar from a single image of an unseen subject. The code for our project will be made publicly available.