Deep neural networks suffer from catastrophic forgetting in class incremental learning, where the classification accuracy of old classes drastically deteriorates when the networks learn the knowledge of new classes. Many works have been proposed to solve the class incremental learning problem. However, most of them either suffer from serious catastrophic forgetting and stability-plasticity dilemma or need too many extra parameters and computations. To meet the challenge, we propose a novel framework, Diverse Knowledge Transfer Transformer~(DKT). which contains two novel knowledge transfers based on the attention mechanism to transfer the task-general knowledge and task-specific knowledge to the current task to alleviate catastrophic forgetting. Besides, we propose a duplex classifier to address the stability-plasticity dilemma, and a novel loss function to cluster the same categories in feature space and discriminate the features between old and new tasks to force the task specific knowledge to be more diverse. Our method needs only a few extra parameters, which are negligible, to tackle the increasing number of tasks. We conduct comprehensive experimental results on CIFAR100, ImageNet100/1000 datasets. The experiment results show that our method outperforms other competitive methods and achieves state-of-the-art performance.