DingLinshan / Huazhong University of Science and Technology
This paper presents a deep reinforcement learning-based decision support methodology for human-robot collaborative order picking systems (HRCPSs), aiming to minimize the total throughput time by efficiently determining the optimal number of pickers and robots. In the HRCPS, pickers are dedicated to multiple zones, the robots assigned to an order must collaborate with pickers in each zone to retrieve all items of the order. Given the demand fluctuations in both quantity and composition, it's crucial for the system to dynamically adjust the number of pickers and robotic resources across different zones to optimize throughput time. To tackle this challenge, we develop a deep reinforcement learning-based decision-making methodology to determine the optimal design parameters for the HRCPS. This methodology uses a long short-term memory Q-network (LSTM-Q) to extract the system state and recognize the decision point. An objective-based reward function and a proximal policy optimization-based train method are proposed to optimize the network parameters of LSTM-Q. We provide an efficient methodology for recognizing the dynamically adjusted requirements (decision points) of this system and selecting the optimal plan for the order picking system. This enables the system to fulfill orders with a lower configuration cost. The proposed methodology provides warehouse managers a basis for pre-configuring resources in response to demand fluctuations. Moreover, it can be applied to various warehouse systems, offering a scalable solution to address evolving operational challenges.