Use Beta Distribution and Thompson Sampling to Beat The Multi-armed Bandit at the Casino

Written by ryan-yu | Published 2020/02/29
Tech Story Tags: reinforcement-learning | ab-testing | machinelearning | multi-armed-bandit | what-is-beta-distribution | what-is-thompson-sampling | what-is-multi-armed-bandit | beta-bernoulli-bandit-betting

TLDR Beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution. We use Beta distribution to model the simplest form of the multi-armed bandit problem, which is the binary outcome/reward. In the casino example, each machine will pay a reward of $1 when the outcome is success, and $0 when it is fail. Our goal is to identify the machine with the highest probability of success.via the TL;DR App

no story

Published by HackerNoon on 2020/02/29