Reinforcement Learning - The Value Function

TLDR

The value function is an efficient way to determine the value of being in a state. In a game of tic-tac-toe, getting 2 Xs in a row does not win the game, hence there is no reward. The value of state A is the sum of all next states’ probability multiplied by the reward for reaching that state A. In this case, a state A has a chance of winning the game by placing it at the top of a row. A state D is a state D with only 1 possible route to state E, since the only outcome is to receive the reward.via the TL;DR App

no story

Written by jingles | A data scientist who also enjoy developing products on the Web.

Published by HackerNoon on 2019/08/16