We study a periodic review distribution inventory system in which multiple regional distribution centers (RDCs) order from one central distribution center (CDC). Each order incurs a fixed cost. It is challenging to manage this system due to the high dimensionality of states and actions, as well as the non-continuous cost structure. In this work, we focus on the class of (s,S) policies and derive gradient estimates for the long-run average cost with respect to the policy parameters using conditional Monte Carlo approach. The response surface of the cost to the policies is discontinuous and bumpy. Based on the gradient estimators, we apply an adaptive learning rate optimization algorithm, which is shown performed well in solving complicated high dimensional problems, to optimize the policy. The numerical experiments illustrate that the algorithm results in near-optimal costs.