Authors: (1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA; (2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA. Table of Links Abstract and 1 Introduction 2 Setting and 2.1 Models of behaviorally-biased opponents 3 Preliminaries and Intuition 4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent 4.3 Win-Stay, Lose-Shift Opponent 4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent 5 Generalizing 5.1 Other Behaviorally-Biased Strategies 5.2 Exploiting an Unknown Strategy from a Known Set of Strategies 6 Future Work and References A Appendix A.1 Win-Stay Lose-Shift Variant: Tie-Stay A.2 Follow-the-Leader Variant: Limited History A.3 Ellipsoid Mistake Bounds A.4 Highest Average Payoff Opponent 4. Strategies for Beating Behaviorally Biased Opponents 4.1 Myopic Best Responder ▶ Theorem 3. Playing Algorithm 2 against the Myopic Best Responder in a permissible game (Definition 1) results in winning every round after the first n + 1 rounds. Proof. The Myopic Best Responder plays a best response to our previous action, so we record a correct best response to each action during the first n + 1 rounds. The Myopic Best Responder always plays the same best response (the first one in its action ordering) following any given action, so we correctly predict the action it will play from round n + 2 onward. Therefore we win every round from round n + 2 onward, since we correctly predict the opponent’s action and play a valid best response to it. 4.2 Gambler’s Fallacy Opponent ▶ Theorem 4. Playing Algorithm 3 against the Gambler’s Fallacy opponent in a permissible game (Definition 1) results in winning every round from round 3n onward. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA; (2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA. Authors: Authors: (1) Avrim Blum, Toyota Technological Institute at Chicago, IL, USA; (2) Melissa Dutz, Toyota Technological Institute at Chicago, IL, USA. Table of Links Abstract and 1 Introduction Abstract and 1 Introduction 2 Setting and 2.1 Models of behaviorally-biased opponents 2 Setting and 2.1 Models of behaviorally-biased opponents 3 Preliminaries and Intuition 3 Preliminaries and Intuition 4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent 4.1 Myopic Best Responder and 4.2 Gambler’s Fallacy Opponent 4.3 Win-Stay, Lose-Shift Opponent 4.3 Win-Stay, Lose-Shift Opponent 4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent 4.4 Follow-the-Leader Opponent and 4.5 Highest Average Payoff Opponent 5 Generalizing 5.1 Other Behaviorally-Biased Strategies 5.1 Other Behaviorally-Biased Strategies 5.2 Exploiting an Unknown Strategy from a Known Set of Strategies 5.2 Exploiting an Unknown Strategy from a Known Set of Strategies 6 Future Work and References 6 Future Work and References A Appendix A.1 Win-Stay Lose-Shift Variant: Tie-Stay A.1 Win-Stay Lose-Shift Variant: Tie-Stay A.2 Follow-the-Leader Variant: Limited History A.2 Follow-the-Leader Variant: Limited History A.3 Ellipsoid Mistake Bounds A.3 Ellipsoid Mistake Bounds A.4 Highest Average Payoff Opponent A.4 Highest Average Payoff Opponent 4. Strategies for Beating Behaviorally Biased Opponents 4.1 Myopic Best Responder ▶ Theorem 3. Playing Algorithm 2 against the Myopic Best Responder in a permissible game (Definition 1) results in winning every round after the first n + 1 rounds. ▶ Theorem 3. Playing Algorithm 2 against the Myopic Best Responder in a permissible game (Definition 1) results in winning every round after the first n + 1 rounds. Proof . The Myopic Best Responder plays a best response to our previous action, so we record a correct best response to each action during the first n + 1 rounds. The Myopic Best Responder always plays the same best response (the first one in its action ordering) following any given action, so we correctly predict the action it will play from round n + 2 onward. Therefore we win every round from round n + 2 onward, since we correctly predict the opponent’s action and play a valid best response to it. Proof 4.2 Gambler’s Fallacy Opponent ▶ Theorem 4. Playing Algorithm 3 against the Gambler’s Fallacy opponent in a permissible game (Definition 1) results in winning every round from round 3n onward. ▶ Theorem 4. Playing Algorithm 3 against the Gambler’s Fallacy opponent in a permissible game (Definition 1) results in winning every round from round 3n onward. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv