Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

# Course References, Links and Random Notes

Title: Probabilistic Reasoning and Reinforcement Learning
Info: ECE 493 Topic 42 - Technical Electives
Instructor: Prof. Mark Crowley, ECE Department, UWaterloo

NOTE: Not updated for Spring 2022 Yet!

Website: markcrowley.ca/rlcourse

## Course Description :

Introduction to Reinforcement Learning (RL) theory and algorithms for learning decision-making policies in situations with uncertainty and limited information. Topics include Markov decision processes, classic exact/approximate RL algorithms such as value/policy iteration, Q-learning, State-action-reward-state-action (SARSA), Temporal Difference (TD) methods, policy gradients, actor-critic, and Deep RL such as Deep Q-Learning (DQN), Asynchronous Advantage Actor Critic (A3C), and Deep Deterministic Policy Gradient (DDPG). [Offered: S, first offered Spring 2019]

# Topics

Primary Textbook : Reinforcement Learning: An Introduction
Small
: Richard S. Sutton and Andrew G. Barto, 2018 [SB]

Some topics are not covered in the SB textbook or they are covered in much more detail than the lectures. We will continue to update this list with references as the term progresses.

1. Motivation & Context [SB 1.1, 1.2, 17.6]
2. Decision Making Under Uncertainty [SB 2.1-2.3, 2.7, 3.1-3.3]
3. Solving MDPs [SB 3.5, 3.6, 4.1-4.4]
4. The RL Problem [SB 3.7, 6.4, 6.5]
5. TD Learning [SB 12.1, 12.2]
6. Policy Search [SB 13.1, 13.2, 13.5]
7. State Representation & Value Function Approximation
8. Basics of Neural Networks
9. Deep RL
10. POMDPs, MARL (skipped in 2020)
11. MCTS, AlphaGo (mentioned briefly in 2020)

## Topic 1 - Basics of Probability

Introductory topics on this from my graduate course ECE 657A - Data and Knowledge Modeling and Analysis are available on youtube and mostly applicable to this course as well.

Probability and Statistics Review (youtube playlist)

Containing Videos on:

• Conditional Prob and Bayes Theorem
• Comparing Distributions and Random Variables
• Hypothesis Testing

For a very fundamental view of probability from another course of Prof. Crowley you can view the lectures and tutorials for ECE 108

ECE 108 Youtube (look at “future lectures” and “future tutorials” for S20): https://www.youtube.com/channel/UCHqrRl12d0WtIyS-sECwkRQ/playlists

The last few lectures and tutorials are on probability definitions as seen from the perspective of discrete math and set theory.

### Likelihood, Loss and Risk

A Good article summarizing how likelihood, loss functions, risk, KL divergence, MLE, MAP are all connected.
https://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/

### Probability Intro Markdown Notes

From the course website for a previous year. Some of this we won’t need so much but they are all useful to know for Machine Learning methods in general.

https://compthinking.github.io/RLCourseNotes/

• Basic probability definitions
• conditional probability
• Expectation
• Inference in Graphical Models
• Variational Inference

## Topic 2.1 - Basic Decision Making Models - Multiarmed Bandits

Week 2
Textbook Sections: [SB 1.1, 1.2, 17.6]

### Videos

- Part 1 - Live Lecture May 17, 2021 on Virtual Classroom - View Live Here

## Topic 3 - Markov Decision Processes

Week 3

### Textbook Sections

• Markov Decision Processes
[SB 3.0-3.4]
• Solving MDPs Exactly
[SB 3.5, 3.6, 3.7]

## Topic 4 - Dynamic Programming

Week 4
Former title: The Reinforcement Learning Problem
Textbook Sections:[SB 4.1-4.4]

## Topic 5 - Temporal Difference Learning - Part 1

Week 5 (June 7-11)

Textbook Sections: Selections from [SB chap 5], [SB 6.0 - 6.5]

• Quick intro to Monte-Carlo methods
• Temporal Difference Updating

Parts:

## Topic 5.1 - TD Learning - Part 2

Week 6 (June 14-17)

• SARSA
• Q-Learning
• Expected SARSA
• Double Q-Learning

## Part 1 Review

Week 7 (June 21 - 25)
Go over any questions or open topics from first 6 weeks.

## MIDTERM Exam

Week 7
Questions on Midterm (June 23-25) can be on any topics up to this point, Weeks 1-6 inclusive.

## Topic 5.2 - N-Step TD (self-study) and Eligibility Traces (skip)

optional topic
Textbook Sections: [SB 12.1, 12.2]

Note (1): Given the pace that people are watching videos, we will drop this topic (eligibility traces). It is less essential in the Deep RL era although very interesting theoretically. Calendar will be updated accordingly.

Eligibility traces, in a tabular setting, lead to a significant benefit in training time in additional to the Temporal Difference method.

In Deep RL it is very common to use experience replay to reduce overfitting and bias to recent experiences. However, experience replay makes it very hard to leverage eligibility traces which require a sequence of actions to distribute reward backwards.

### Videos:

• ET1 - One Step vs Direct Value Updates
• ET2 - ET2 N Step TD Forward View

Probabilistic Reasoning and Reinforcement Learning

7:35
NOW PLAYING
ET3 N Step TD Backward View
Probabilistic Reasoning and Reinforcement Learning

14:39
NOW PLAYING
ET4 Eligibility Traces On Policy
Probabilistic Reasoning and Reinforcement Learning

9:08
NOW PLAYING
ET5 Eligibility Traces Off Policy
Probabilistic Reasoning and Reinforcement Learning

## Topic 6 - State Representation & Value Function Approximation

Week 8 (June 28- July 2)

### VFA Concept

A Value Function Approximation (VFA)
is a necessary technique to use whenever the size of the state of action spaces become too large to represent the value function explicitly as a table. In practice, any practical problem needs to use a VFA.

#### Benefits of VFA

• Reduce memory need to store the functions (transition, reward, value etc)
• Reduce computation to look up values
• Reduce experience needed to find the optimal value or policy (sample efficiency)
• For continuous state spaces, a coarse coding or tile coding can be effective

#### Types of Function Approximators

• Linear function approximations (linear combination of features)
• Neural Networks
• Decision Trees
• Nearest Neighbors
• Fourier/ wavelet bases

#### Finding an Optimal Value Function

When using a VFA, you can use a Stochastic Gradient Descent (SGD) method to search for the best weights for your value function according to experience.
This parametric form the value function will then be used to obtain a greedy or epsilon-greedy policy at run-time.

This is why using a VFA + SGD is still different from a Direct Policy Search approach where you optimize the parameters of the policy directly.

### Other Resources:

Week 9 (July 5 - 9)
[SB 13.1, 13.2, 13.5]

• Actor-Critic

### Video:

Some of the posts used for lecture on July 26.

### Actor-Critic Algorithm

Very clear blog post on describing Actor-Critic Algorithms to improve Policy Gradients

### Cutting Edge Algorithms

Going beyond what we covered in class, here are some exciting trends and new advances in RL research in the past few years to find out more about.
PG methods are a fast changing area of RL research. This post has a number of the successful algorithms in this area from a few years ago:

## Topic 8 - Deep Learning Fundamentals

Week 10 (July 12 - July 16)

### Assignments

• Assignment 2 had an extended deadline and was due this Monday July 12. Good work everyone!
• Assignment 3 is now out and can be looked found at Gitlab: ece493finalassignment

### Lectures

This week is a good time to:

• Catch up on topics from Week’s 8 and 9
• Review, or learn, a bit about Deep Learning
• See videos and content from DKMA Course (ECE 657A)
• This youtube playlist is a targeted “Deep Learning Crash Course” ( #dnn-crashcourse-for-rl ) with just the essentials you’ll need for Deep RL.
• That course also has more detailed videos on Deep Learning which won’t be specifically useful for ECE 493, but which you can refer to if interested.

### Lect 9B - Deep Learning Introduction

In this video go over some of the fundamental concepts that led to neural networks (such as linear regression and logistic regression models), the basic structure and formulation of classic neural networks and the history of their development.

### Lect 11A - 1 - Deep Learning Fundamentals

This video goes through a ground level description of logistic neural units, classic neural networks, modern activation functions and the idea of a Neural Network as a Universal Approximator.

### Lect 11A - 1.2 - Deep Learning - Gradient Descent

In this video we discuss the nuts and bolts of how training in Neural Networks (Deep or Shallow) works as a process of incremental optimization of weights via gradient descent. Topics discussed: Backpropagation algorithm, gradient descent, modern optimizer methods.

### Lect11B - 1 - DeepLearning - Fundamentals II

In this video we go over the fundamentals of Deep Learning from a different angle using the approach from Goodfellow et. al.’s Deep Learning Textbook and their network graph notation for neural networks.

We describe the network diagram notation, and how to view neural networks in this way, focussing on the relationship between sets of weights and layers.

Other topics include: gradient descent, loss functions, cross-entropy, network output distribution types, softmax output for classification.

### Lect 11B - 2 - Deep Learning - Fundamentals III

This video continues with the approach from Goodfellow et. al.’s Deep Learning Textbook and goes into detail about computational methods, efficiency and defining the measure being used for optimization.

Topics covered include: relationship of network depth to generalization power, computation benefits of convolutional network structures, revisiting the meaning of backpropagation, methods for defining loss functions

### Lect 11B - 3 - Deep Learning - Regularization

In this lecture I talk about some of the problems that can arise when training neural networks and how they can be mitigated. Topics include : overfitting, model complexity, vanishing gradients, catastrophic forgetting and interpretability.

### Lect 11B - 4 - Deep Learning - Data Augmentation and Vanishing Gradients

In this video we give an overview of several approaches for making DNNs more usable when data is limited with respect to the size of the network. Topics include data augmentation, residual network links, vanishing gradients.

## Topic 9 - Deep Reinforcement Learning

Week 12 (July 26-July 30)

• PPO - not covered in detail
• DDPG - not covered in detail

### A3C/A2C Resources

• Blog from OpenAI introducing their implementation for A3C and analysis of how a simpler, non-parallalized version they call A2C is just as good:
• The original A3C paper from DeepMind:
• Good summary of these algorithms with cleaned up pseudocode and links:

## Topic 11 - Looking Ahead with Tree Search - MCTS and AlphaGo

Week 12 (July 26 - July 30)

• Monte-Carlo Tree Search (MCTS) - - not covered in detail
• How AlphaGo works (combining DQN and MCTS)
• The Future of Deep RL

## Review and End of Classes

Week 13 (Aug 1 - Aug 6)

## Final Exam

Week 14 (Aug 12-14)

# Primary References for Course

[SuttonBarto2018] - Reinforcement Learning: An Introduction. Book, free pdf of draft available.
http://incompleteideas.net/book/the-book-2nd.html

## Other Useful Textbooks

[Dimitrakakis2019] - Decision Making Under Uncertainty and Reinforcement Learning

https://arxiv.org/abs/1609.04436

## Deep Q Network vs Policy Gradients - An Experiment on VizDoom with Keras

A nice blog post on comparing DQN and Policy Gradient algorithms such A2C.
https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html

## Open AI Reference Website

This website is a great resource. It lays out concepts from start to finish. Once you get through the first half of our course, many of the concepts on this site will be familiar to you.

### Key Papers in Deep RL List

https://spinningup.openai.com/en/latest/spinningup/keypapers.html

### Fundamental RL Concepts Overview

The fundamentals of RL are briefly covered here. We will go into all this and more in detail in our course.
https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

### Family Tree of Algorithms

Here, a list of algorithms at the cutting edge of RL as of 1 year ago to so, so it’s a good place to find out more. But in a fast growing field, it may be a bit out of date about the latest now.
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html

## Reinforcement Learning Tutorial with Demo on GitHub

This is a thorough collection of slides from a few different texts and courses laid out with the essentials from basic decision making to Deep RL. There is also code examples for some of their own simple domains.
https://github.com/omerbsezer/Reinforcement_learning_tutorial_with_demo#ExperienceReplay

# Old Topics Archive

Other resources connected with previous versions of the course, I’m happy to talk about any of these if people are interested.

## Bayes Nets (dropped)

### Tools

SamIam Bayesian Network GUI Tool

Other Tools

### References

Some videos and resources on Bayes Nets, d-seperation, Bayes Ball Algorithm and more: