Gradient sparsification and quantization offer a promising prospect to alleviate the communication overhead problem in distributed learning. However, direct combination of the two results in suboptimal solutions, due to the fact that sparsification and quantization haven't been learned together. In this paper, we propose Joint Sparsification-Quantization (JointSQ) inspired by the discovery that sparsification can be treated as 0-bit quantization, regardless of architectures. Specifically, we mathematically formulate JointSQ as a mixed-precision quantization problem, expanding the solution space. It can be solved by the designed MCKP-Greedy algorithm. Theoretical analysis demonstrates the minimal compression noise of JointSQ, and extensive experiments on various network architectures, including CNN, RNN, and Transformer, also validate this point. Under the introduction of computation overhead consistent with or even lower than previous methods, JointSQ achieves a compression ratio of 1000× on different models while maintaining near-lossless accuracy and brings 1.4× to 2.9× speedup over existing methods.