Robotic mobile fulfillment systems (RMFSs) are leading the transformation within the e-commerce sector, offering an innovative approach to the fulfillment of customer orders. These systems employ mobile robots to transport pods of products directly to workstations, where human pickers can swiftly and accurately select the items customers have ordered. The automation inherent in these parts-to-picker systems not only significantly reduces labor costs but also presents new operational challenges that demand innovative solutions. In this paper, we tackle the primary scheduling issues that arise during the order processing phase, focusing on three key areas: pod retrieval, pod repositioning, and robot assignment. Our goal is to develop a strategy that minimizes the system's makespan. With robot as agent, we utilize the markov decision process (MDP) framework as the foundation for our decision-making model. To solve the problem, we adopt a deep reinforcement learning algorithm, the proximal policy optimization (PPO) method, where the action taken by the agent is influenced by policy network and state-value function. Effective empirical rules and invalid action masking (IAM) technique are employed to narrow the search space, which speed up the training process greatly. We assess the performance of our method across a series of scenarios, ranging from small-scale synthetic instances to large-scale cases based on real-world datasets. The computational results demonstrate the effectiveness of our model and algorithm. Additionally, our analysis uncovers a critical operational insight: as the workload intensity at workstations increases, the scheduling of robots becomes more dependent on coordination to enhance overall system performance.