Let’s see how we can train a bot to evacuate the building in minimum time I found as one of the best sources over the internet to get started with Q-Learning. Apart from this also helps me to take an informed step in this field. This blog is highly inspired from , I would be coding out the same example that has been mentioned there. Painless Q-Learning Basic Reinforcement Learning Painless Q-Learning This blog assumes the reader to have an understanding of about Q-Learning. If not, do read this and then come back to see the implementation of the same example. Painless Q-Learning I will be making a dedicated blog on explaining Q-Learning in near future. Let’s code… class QLearning:def __init__(self):# initializing the environment here# ideally should have made a different classself.state_action_mat = {0:[4],1:[3,5],2:[3],3:[1,2,4],4:[0,3,5],5:[1,4,5]}self.state_action_reward_mat = {(0,4):0,(1,3):0,(1,5):100,(2,3):0,(3,1):0,(3,2):0,(3,4):0,(4,0):0,(4,4):0,(4,5):100,(5,1):0,(5,4):0,(5,5):100,}self.q_matrix = {}self.goal_state = 5self.gamma = 0.5self.episodes = 50self.states = 6 I made a class to handle all the functions required in the learning process, also I have initialized the environment here , but it’s ok for now. Here is some specification about the variables from the initialization function. (should have done under Environment class) : Is a dictionary holding possibilities of direct state transition from one to other. e.g. Our bot can go from state to also similarly it can go from state to . state_action_mat 4 0, 3, 5 2 3 : Is a dictionary holding and as the and for the transition as . state_action_reward_mat start end states key reward value : This is the variable that we will learn over the learning process. You can think of this as the . q_matrix brain/memory of our bot : Final exit state or termination state. goal_state : This parameter varies from 0 to 1. It can be seen as the weight our bot should give to the future rewards. A value near 0 means less exploration and giving high weight to intermediate rewards and not thinking of long term. It is a that can be tuned for better performance of our bot. gamma hyper-parameter : You can think of this as . A life is defined as . It is again a that can be tuned for better performance of our bot episodes number of life in a game once achieving an end state hyper-parameter : Total count of the possible states. states You can see the pictorial view of the environment below: [ ] Source Here each node in the graph is a room in the building. The arrows show the connectivity from one to other. We define each room as a and each arrow as the . state action # Instance of QLearningqlearn = QLearning() # initialize Q-Matrix to 0(Zero) score valuesqlearn.populate_qmat() # loop through multiple episodesfor e in xrange(qlearn.episodes): # start with initial random state  
initial\_state = qlearn.random\_state()  
start = initial\_state  

# steps taken to reach the goal  
steps = 0  

# path  
path = \[\]  

# till goal is reached in each episode  
while True:  

    steps += 1  
      
    # find action from a particular state with max Q-value  
    action = qlearn.find\_action(initial\_state)  
      
    # set this to next state  
    nextstate = action  

    # find qmax for the next state  
    qmax\_next\_state = qlearn.max\_reward\_next\_state(nextstate)  

    # update the q matrix  
    reward = qlearn.state\_action\_reward\_mat.get((initial\_state, action))  
      
    # transition not possible from state to state  
    if not reward:  
        reward = -1  

    # update the Q-matrix  
    qlearn.q\_matrix\[(initial\_state, action)\] = reward + qlearn.gamma \* qmax\_next\_state  
      
    # path update  
    path.append(initial\_state)  

    # traverse to the next state  
    initial\_state = nextstate  
      
    if initial\_state == qlearn.final\_state:  
        print ('Reached the goal in the episode number {} in {}'.\\  
                  format(colored(e, 'red'), colored(steps,'green')))  
        path = \[\]  
        break The above snippet is an iterative process over episodes that we do to make our bot learn over time. We start by initializing to , considering that bot has when it starts learning. We start with certain random state and take an informed decision of transition to go to another state by choosing a state which has . Now, considering this as a new state, we look for the maximum q value that we can get from transitioning to any other possible state. Next, we calculate the reward of going from current state to next and put that all in the Q-Learning formulae optimal path q_matrix Zeros no memory maximum q-value [ ] Source which updates the brain of our bot with some knowledge of the environment. At last, we set next state to current state showing the transition till goal state is reached for this episode. So, iterating over multiple episodes our bot figures out the optimal path from any room to exit. I have not described certain methods here. So, you can find full code . here You can grab my learnings on @ / / / Reinforcement Learning 1 2 3 4 / 5 Feel free to comment and share your thoughts. Do share and clap if you ❤‍ it.

The Graph

Reinforcement Learning — Part 6

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Layman’s Introduction to Principal Components

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

A Layman’s Introduction to Principal Components

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps