A Dynamic Programming Approach to Optimizing Signaling Strategies in Multi-phase Trials:

Authors: (1) Shih-Tang Su, University of Michigan, Ann Arbor (shihtang@umich.edu); (2) Vijay G. Subramanian, University of Michigan, Ann Arbor and (vgsubram@umich.edu); (3) Grant Schoenebeck, University of Michigan, Ann Arbor (schoeneb@umich.edu). Table of Links Abstract and 1. Introduction 2. Problem Formulation 2.1 Model of Binary-Outcome Experiments in Two-Phase Trials 3 Binary-outcome Experiments in Two-phase Trials and 3.1 Experiments with screenings 3.2 Assumptions and induced strategies 3.3 Constraints given by phase-II experiments 3.4 Persuasion ratio and the optimal signaling structure 3.5 Comparison with classical Bayesian persuasion strategies 4 Binary-outcome Experiments in Multi-phase trials and 4.1 Model of binary-outcome experiments in multi-phase trials 4.2 Determined versus sender-designed experiments 4.3 Multi-phase model and classical Bayesian persuasion and References 4 Binary-outcome Experiments in Multi-phase trials This section generalizes the structural results in Section 3 to multi-phase trials. First, we generalize the model in Section 2.1 to multi-phase trials and then propose a dynamic programming algorithm to solve for the optimal signaling strategy. The state for the dynamic program will be the interim belief on the state of the world that results at any node in the extensive-form delineation of the problem. As the belief at each level is determined based on the actions in earlier stages (if any), in the backward iteration procedure, we will determine the optimal choice of experiments by the sender (if there is a choice) for any possible interim belief. In this dynamic programming, there is only a terminal reward that arises from the receiver’s action based on the outcome of the final trial and based on the receiver’s resulting posterior beliefs. 4.1 Model of binary-outcome experiments in multi-phase trials This paper is available on arxiv under CC 4.0 license. Authors: (1) Shih-Tang Su, University of Michigan, Ann Arbor (shihtang@umich.edu); (2) Vijay G. Subramanian, University of Michigan, Ann Arbor and (vgsubram@umich.edu); (3) Grant Schoenebeck, University of Michigan, Ann Arbor (schoeneb@umich.edu). Authors: Authors: (1) Shih-Tang Su, University of Michigan, Ann Arbor (shihtang@umich.edu); (2) Vijay G. Subramanian, University of Michigan, Ann Arbor and (vgsubram@umich.edu); (3) Grant Schoenebeck, University of Michigan, Ann Arbor (schoeneb@umich.edu). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Problem Formulation 2. Problem Formulation 2.1 Model of Binary-Outcome Experiments in Two-Phase Trials 2.1 Model of Binary-Outcome Experiments in Two-Phase Trials 3 Binary-outcome Experiments in Two-phase Trials and 3.1 Experiments with screenings 3 Binary-outcome Experiments in Two-phase Trials and 3.1 Experiments with screenings 3.2 Assumptions and induced strategies 3.2 Assumptions and induced strategies 3.3 Constraints given by phase-II experiments 3.3 Constraints given by phase-II experiments 3.4 Persuasion ratio and the optimal signaling structure 3.4 Persuasion ratio and the optimal signaling structure 3.5 Comparison with classical Bayesian persuasion strategies 3.5 Comparison with classical Bayesian persuasion strategies 4 Binary-outcome Experiments in Multi-phase trials and 4.1 Model of binary-outcome experiments in multi-phase trials 4 Binary-outcome Experiments in Multi-phase trials and 4.1 Model of binary-outcome experiments in multi-phase trials 4.2 Determined versus sender-designed experiments 4.2 Determined versus sender-designed experiments 4.3 Multi-phase model and classical Bayesian persuasion and References 4.3 Multi-phase model and classical Bayesian persuasion and References 4 Binary-outcome Experiments in Multi-phase trials This section generalizes the structural results in Section 3 to multi-phase trials. First, we generalize the model in Section 2.1 to multi-phase trials and then propose a dynamic programming algorithm to solve for the optimal signaling strategy. The state for the dynamic program will be the interim belief on the state of the world that results at any node in the extensive-form delineation of the problem. As the belief at each level is determined based on the actions in earlier stages (if any), in the backward iteration procedure, we will determine the optimal choice of experiments by the sender (if there is a choice) for any possible interim belief. In this dynamic programming, there is only a terminal reward that arises from the receiver’s action based on the outcome of the final trial and based on the receiver’s resulting posterior beliefs. 4.1 Model of binary-outcome experiments in multi-phase trials This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv