Reinforcement Learning

1. Abstract
2. Interesting Caveats
- 2.1. Temporal credit assignment problem
- 2.2. Exploration vs Exploitation

1. Abstract

The machine (in the context of Machine Learning) lives in an environment and can interpret it's state in the form of a feature vector.

The machine can execute actions in these states and there are associated rewards with these actions which also alter the state.

The objective of a reinforcement learning algorithm is to learn a policy function, that, given the environment state, outputs an optimal (to maximize rewards) action to execute in that state.

The action is optimal if it maximizes the expected average reward.

Deep Learning can fit in here for mapping an environment state to an action that needs to be taken (see Policy Networks).

2. Interesting Caveats

2.1. Temporal credit assignment problem

if the rewards follow after an uncertain amount of time, causally tracking down the definite set of actions that led to that result is needed to formulate a policy

Reinforcement Learning

Table of Contents

1. Abstract

2. Interesting Caveats

2.1. Temporal credit assignment problem

2.2. Exploration vs Exploitation