With the rapid development of cloud computing and intelligent manufacturing technology, cloud manufacturing, as a new manufacturing mode, is gradually changing the production ways of traditional manufacturing. Cloud manufacturing integrates service resources into a unified platform, achieving optimized configuration and efficient utilization of resources on the platform. The combination of services is a key process in optimizing resource allocation. Currently, there has been a significant amount of research on service composition in cloud manufacturing, but there has been little consideration given to the dynamic nature of the environment and the unknown task information. Additionally, service resources in cloud manufacturing belong to reusable resources, meaning that once a task is completed, the resource will be returned to the platform for reuse. Based on this, this article reconstructs the service composition problem into a reusable resource allocation problem, and proposes a new Markov decision model for unknown task arrival distribution. Finally, a model-free PPO algorithm is designed for training and testing, and hindsight optimal performance is used for verification. The proposed model and algorithm have shown excellent performance in most cases.