678 / 2024-05-08 17:57:28
An Efficient Node Selection Policy for Monte Carlo Tree Search with Neural Networks
Monte Carlo Tree Search,Node Selection Policy,Neural Network,Ranking and Selection
摘要待审
LiuXiaotian / Peking University
PengYijie / Peking University
ZhangGongbo / Peking University
ZhouRuihan / Peking University
Monte Carlo Tree Search (MCTS) has been gaining increasing popularity, and the success of AlphaGo has prompted a new trend of incorporating a value network and a policy network constructed with neural networks into MCTS, namely NN-MCTS. In this work, motivated by the shortcomings of the widely used Upper Confidence Bound for Trees (UCT) policy, we formulate the node selection problem in NN-MCTS as a Ranking and Selection (R\&S) problem and provide a new node selection policy that efficiently allocates a limited search budget to maximize the probability of correctly selecting the best action at each node. The value network and policy network in NN-MCTS further improve the performance of the proposed node selection policy by providing prior knowledge and guiding the selection of the final action, respectively. Numerical experiments on two board games and an OpenAI task demonstrate that the proposed method outperforms the UCT policy used in AlphaGo Zero and MuZero, implying the potential of constructing node selection policies in NN-MCTS with R\&S methods.
重要日期
  • 会议日期

    06月28日

    2024

    07月01日

    2024

  • 05月05日 2024

    摘要录用通知日期

  • 05月12日 2024

    摘要截稿日期

  • 07月01日 2024

    注册截止日期

主办单位
中国科学技术大学
协办单位
管理科学与工程学会
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询