BansalSaurabh / Smeal College of Business; The Pennsylvania State University
This paper is based on a research collaboration with a large agribusiness firm in North America. The firm invests a considerable sum annually in developing new varieties of seeds for food crops. In many geographical regions in the US, the changing climactic conditions interact with the potential of seeds in in complex ways, and as a result, it is no longer accurate to assume specific parametric distributions for yield uncertainty - the uncertainty in the output from per acre of land. Instead, the firm now collects yield data during field trials and directly optimizes its large-scale production decision based on these observations as potential scenarios, using sample average approximations in the form of stochastic linear programs. Recent literature provides performance guarantees for these programs as a function of number of observations. These guarantees tighten as data size increases, however collecting observations e.g., from test fields for yield is also significantly costly. In this paper we develop a framework to optimize this tradeoff between the benefit and cost of collecting observations. Furthermore, the firm had a limited budget to collect samples of a portfolio of seeds, and it needed a transparent way to determine the optimal number of observations for the yield uncertainty. We extend our framework to determine the allocation of a limited number of test fields to various seeds, when it seeks to perform a data-driven optimization for production under yield uncertainty. We also provide
an analytical expression for ambiguity premium for a canonical data-driven optimization problem in the
presence of performance bounds. Subsequently, we discuss the implementation of this protocol at the firm,
along with a documentation of the benefits realized