Since human body pose estimation is becoming a hot topic in the biomedical field, researchers are actively trying new methods to improve resolution accuracy and speed. At present,the existing models for human pose estimation have a large number of parameters, which makes it difficult for them to meet the real-time requirements in the real world, while the simple compression of network parameters is difficult to take into account the accuracy. A network combining human parsing and pose estimation is proposed in this paper. Firstly, based on the framework of human pose estimation network, in the backbone network, we use the extreme value distribution theory to construct the residual connection gate to reduce the amount of network forward propagation calculation. Then, we discuss the similarity between human parsing and pose estimation feature extraction, and introduce a human parsing branch. Benefiting from the advantages of multi task learning, these two kinds of tasks promote each other through enrich the diversity of feature extraction. Finally, the experiments effectively proves that the proposed model can predict pose with high quality.