Reinforcement learning is a computational approach used to understand and automate goal-directed learning and decision-making. This article explains the fundamentals of reinforcement learning, how to use Tensorflow’s libraries and extensions to create reinforcement learning models and methods.
What is Reinforcement Learning?
Reinforcement learning is a high-level framework used to solve sequential decision-making problems. It learns from direct interaction with its environment, without relying on a predefined labeled dataset. It is goal oriented and learns sequences of actions that will maximize the outcome of the action.
A few fundamental concepts form the basis of reinforcement learning:
Agent: An agent performs actions in a given environment. An algorithm is an example of an agent.
Environment: The environment uses information about the agent’s current state and action as input, and returns the agent’s reward and its next state as output. An example of an environment is the laws of physics.
Action: An action is the set of all possible moves the agent can make. Agents choose from a list of possible actions. In video games, for example, the list might include: running right or left, jumping high or low, crouching or standing still.
State: A state is a situation in which the agent finds itself. It can be a specific place and moment, a situation returned by the environment or any future situation.
Reward: A reward is a feedback used to effectively evaluate the agent’s action.
This interaction can be seen in the diagram below:
The agent learns through repeated interaction with the environment. To be successful, the agent needs to:
– Learn the interaction between states, actions, and subsequent rewards.
– Determine which action will provide the optimal outcome.
Reinforcement Learning Use Cases
Reinforcement learning algorithms can be used to solve problems that arise in business settings where task automation is required:
Video games: In video games, the agent’s goal is to maximize the score. Each action throughout the game will affect how the agent behaves in relation to this goal.
Delivery drones: Reinforcement learning can be used in drone autopilots, as it provides path tracking and navigation capabilities.
Manufacturing robots: Reinforcement learning lets a robot autonomously discover optimal behavior through trial-and-error interactions with its environment. A robot can pick specific objects out of a box with some image annotations and sensor technology.
Computational resource optimization: Finding solutions for resource management tasks, such as allocating computers to pending jobs, can be challenging. Reinforcement learning algorithms can be used to learn about vacancies and optimally allocate resources to waiting jobs.
Personalized recommendations: It can be challenging to create personalized news or advertisement recommendations, because of unpredictable user preferences. The reinforcement learning approach uses feedback from the user to model a recommendation framework with accurate predictions of future rewards.
Reinforcement Learning in Tensorflow: Libraries and Extensions
TensorFlow provides official libraries to build advanced reinforcement learning models or methods using TensorFlow.
TF-Agents: A Flexible Reinforcement Learning Library for TensorFlow
TF-Agents is a modular, well-tested open-source library for deep reinforcement learning with TensorFlow. In TF-Agents, the core elements of reinforcement learning algorithms are implemented as Agents.
Currently, the following algorithms are available under TF-Agents:
DQN: Human level control through deep reinforcement learning
DDQN: Deep Reinforcement Learning with Double Q-learning Hasselt
DDPG: Continuous control with deep reinforcement learning Lillicrap
TD3: Addressing Function Approximation Error in Actor-Critic Methods Fujimoto
REINFORCE: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
PPO: Proximal Policy Optimization Algorithms Schulman
SAC: Soft Actor Critic Haarnoja
Dopamine: TensorFlow-Based Research Framework
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Dopamine provides the following features for reinforcement learning researchers:
– Flexibility—new researchers can easily try out new ideas and run benchmark experiments.
– Stability—provides a few implemented and tested algorithms.
– Reproducibility—Dopamine code has full test coverage. These tests also serve as an additional form of documentation.
TRFL: A Library of Reinforcement Learning Building Blocks
TRFL (pronounced “truffle”) is a collection of key algorithmic components for DeepMind agents such as DQN, DDPG, and IMPALA. The TRFL library includes functions to implement both classical reinforcement learning algorithms as well as more cutting-edge techniques.
TRFL can be installed from pip with the following command: pip install trfl
Install Tensorflow and Tensorflow-probability separately to allow TRFL to work both with TensorFlow GPU and CPU versions.
TensorFlow Reinforcement Learning Example using TF-Agents
In this reinforcement learning tutorial, we will train the Cartpole environment. This is a game that can be accessed through Open AI, an open source toolkit for developing and comparing reinforcement learning algorithms. Following is a screen capture from the game:
Source: Open AI
Process:
1. Setup reinforcement learning environments: Define suites
for loading environments from sources such as the OpenAI Gym, Atari, DM Control, etc., given a string environment name.
2. Setup reinforcement learning agent: Create standard TF-Agents such as DQN, DDPG, TD3, PPO, and SAC.
3. Define standard reinforcement learning policies
4. Define metrics for evaluation of policies.
5. Collect data: define a function to collect an episode using the given data collection policy and save the data.
6. Train the agent
7. Visualize the performance of the agent