In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of $O\left({\left({1 + \rho + {1 \over \rho}} \right)d\,\ln \,T\ln {K \over \delta}\sqrt {dK{T^{1 + 2}}\ln {K \over \delta}{1 \over}}} \right)$ that holds with probability 1 − δ under the mean-variance criterion with risk tolerance ρ, for any $0 < \in < \frac{1}{2},0 < \delta < 1$. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.
Deep learning has been recently studied to generate high-quality prediction intervals (PIs) for uncertainty quantification in regression tasks, including recent applications in simulation metamodeling. The high-quality criterion requires PIs to be as narrow as possible, whilst maintaining a pre-specified level of data (marginal) coverage. However, most existing works for high-quality PIs lack accurate information on conditional coverage, which may cause unreliable predictions if it is significantly smaller than the marginal coverage. To address this problem, we propose an end-to-end framework which could output high-quality PIs and simultaneously provide their conditional coverage estimation. In doing so, we design a new loss function that is both easy-to-implement and theoretically justified via an exponential concentration bound. Our evaluation on real-world benchmark datasets and synthetic examples shows that our approach not only achieves competitive results on high-quality PIs in terms of average PI width, but also accurately estimates conditional coverage information that is useful in assessing model uncertainty.
Photolithography is among the key phases in chip manufacturing. It is also among the most expensive with manufacturing equipment valued at the hundreds of millions of dollars. It is paramount that the process is ran efficiently, guaranteeing high resource utilization and low product cycle times. A key element in the operation of a photolithography system is the effective management of the reticles that are responsible for the imprinting of the circuit path on the wafers. Managing reticles means determining which are appropriate to mount on the very expensive scanners as a function of the product types being released to the system. Given the importance of the problem, several heuristic policies have been developed in the industry practice in an attempt to guarantee that the expensive tools are never idle. However, such policies have difficulties reacting to unforeseen events (e.g., unplanned failures, unavailability of reticles). On the other hand, the technological advance of the semiconductor industry in sensing at system and process level should be harnessed to improve on these “expert policies”. In this manuscript, we develop a system for the real time reticle management that not only is able to retrieve information from the real system, but also is able to embed commonly used policies to improve upon them. We develop a new digital twin for the photolithography process that efficiently and accurately predicts the system performance, thus allowing our system to make predictions for future behaviors as a function of possible decisions. Our results demonstrate the validity of the developed model, and the feasibility of the overall approach demonstrating a statistically significant improvement of performance as compared to the current policy.
The problems of online pricing with offline data, among other similar online decision making with offline data problems, aim at designing and evaluating online pricing policies in presence of a certain amount of existing offline data. To evaluate pricing policies when offline data are available, the decision maker can either position herself at the time point when the offline data are already observed and viewed as deterministic, or at the time point when the offline data are not yet generated and viewed as stochastic. We write a framework to discuss how and why these two different positions are relevant to online policy evaluations, from a worst-case perspective and from a Bayesian perspective. We then use a simple online pricing setting with offline data to illustrate the constructions of optimal policies for these two approaches and discuss their differences, especially whether we can decompose the searching for the optimal policy into independent subproblems and optimize separately, and whether there exists a deterministic optimal policy.
The problem of maximizing the throughput of Semiconductor Wafer Fabrication Systems is addressed. We model the fabrication systems as a Stochastic Timed Automata and design a discrete-event simulation scheme. The simulation scheme is explicit, fast and achieves high fidelity which captures the feature of reentrant process flow and is flexible to accommodate diversified wafer lot scheduling policies. A series of Marginal Machine Allocation Algorithms are proposed to sequentially allocate machines. Numerical experiments suggest the designed methods are efficient to find good allocation solutions.