Before starting on this topic, I had experience working on both supervised and unsupervised learning problems using machine learning models (working with the sklearn library), and some deep learning experience for supervised learning problems both in NLP and image labeling (using keras). I’d also messed around using deep learning for image manipulation (my favourite being turning personal photos into the images in the style of The Simpsons). And I had completed the first part of the fast.ai course and some of the second.

My introduction to this topic came from a Stanford University lecture posted on YouTube. I’ve been working on a write-up of the content covered (slide for slide) to give myself a reference for reading through at a slower pace and more time to grasp some of the functions and ideas, or to scan through in future when my memory of this topic gets hazy. You can find it in it’s partially finished state here.

What Is Reinforcement Learning?

First let’s get the basic definition of some of the key terms used when talking about Reinforcement Learning:

The Reinforcement Learning Loop

A reinforcement learning setup consists of an agent and an environment. The environment gives the agent a state. In turn, the agent is going to take an action, and then the environment is going to give back a reward, as well as the next state. This is going to keep going on in this loop, until the environment returns a terminal state, which then ends the episode.

The goal is to learn which series of actions will maximize its overall reward.

Examples of Reinforcement Learning Problems

There are plenty of examples in which Reinforcement Learning is used, especially in robotics as machines learn to move, stay upright, and grasp things, etc. But my favourite is using it to learn to play Atari games.

In this instance the definitions of our keywords are as follows:

Environment: The game itself, i.e. the states and actions that can be performed within it
Agent: The player of the game who inputs the actions
State: The raw pixel inputs of the game state at a given point in time
Action: Input controls - movement left or right at a given point in time
Reward: The score increase at a given point in time
Terminal state: When the player either wins or dies
Episode: A full game, from start to completion

Here the goal is obviously to get the maximum score possible. Google’s DeepMind have used reinforcement learning to solve this exact problem, you can view their agent playing with different levels of training in this video.

It’s pretty cool!

My next post on Reinforcement Learning is about how to write the problem mathematically