Title: Probabilistic Reasoning and Reinforcement Learning
Info: ECE 493 Topic 42 - Technical Electives
Instructor: Prof. Mark Crowley, ECE Department, UWaterloo
Website: markcrowley.ca/rlcourse
Primary Textbook : Reinforcement Learning: An Introduction
Small : Richard S. Sutton and Andrew G. Barto, 2018 [SB]
Some topics are not covered in the SB textbook or they are covered in much more detail than the lectures. We will continue to update this list with references as the term progresses.
Other resources connected with previous versions of the course, I’m happy to talk about any of these if people are interested.
Introduction to Reinforcement Learning (RL) theory and algorithms for learning decision-making policies in situations with uncertainty and limited information. Topics include Markov decision processes, classic exact/approximate RL algorithms such as value/policy iteration, Q-learning, State-action-reward-state-action (SARSA), Temporal Difference (TD) methods, policy gradients, actor-critic, and Deep RL such as Deep Q-Learning (DQN), Asynchronous Advantage Actor Critic (A3C), and Deep Deterministic Policy Gradient (DDPG). [Offered: S, first offered Spring 2019]
Week 2
Textbook Sections: [SB 1.1, 1.2, 17.6]
Week 3
Week 4
Former title: The Reinforcement Learning Problem
Textbook Sections:[SB 4.1-4.4]
Week 5 (June 7-11)
Textbook Sections: Selections from [SB chap 5], [SB 6.0 - 6.5]
Week 6 (June 14-17)
Week 7 (June 21 - 25)
Go over any questions or open topics from first 6 weeks.
Week 7
Questions on Midterm (June 23-25) can be on any topics up to this point, Weeks 1-6 inclusive.
optional topic
Textbook Sections: [SB 12.1, 12.2]
Note: Given the pace that people are watching videos, we will drop this topic. It is less essential in the Deep RL era although very interesting theoretically. Calendar will be updated accordingly.
Week 8 (June 28- July 2)
Week 9 (July 5 - 9)
[SB 13.1, 13.2, 13.5]
Week 10
Note: If Topic 5.2 is dropped this will be a week earlier.
Week 11
Note: If Topic 5.2 is dropped this will be a week earlier.
Week 12
Week 13
[SuttonBarto2018] - Reinforcement Learning: An Introduction. Book, free pdf of draft available.
http://incompleteideas.net/book/the-book-2nd.html
[Dimitrakakis2019] - Decision Making Under Uncertainty and Reinforcement Learning
[Ghavamzadeh2016] - Bayesian Reinforcement Learning: A Survey. Ghavamzadeh et al. 2016.
https://arxiv.org/abs/1609.04436
This website is a great resource. It lays out concepts from start to finish. Once you get through the first half of our course, many of the concepts on this site will be familiar to you.
https://spinningup.openai.com/en/latest/spinningup/keypapers.html
The fundamentals of RL are briefly covered here. We will go into all this and more in detail in our course.
https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
Here, a list of algorithms at the cutting edge of RL as of 1 year ago to so, so it’s a good place to find out more. But in a fast growing field, it may be a bit out of date about the latest now.
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
This is a thorough collection of slides from a few different texts and courses laid out with the essentials from basic decision making to Deep RL. There is also code examples for some of their own simple domains.
https://github.com/omerbsezer/Reinforcement_learning_tutorial_with_demo#ExperienceReplay
A nice blog post on comparing DQN and Policy Gradient algorithms such A2C.
https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html
AAMAS 2021 conference just finished recently and is focussed on decision making and planning, lots of RL papers.
ICLR 2020 conference (https://iclr.cc/virtual_2020/index.html)
Introductory topics on this from my graduate course ECE 657A are available on youtube and mostly applicable to this course as well.
For a very fundamental view of probability from another course of Prof. Crowley you can view the lectures and tutorials for ECE 108
ECE 108 Youtube (look at “future lectures” and “future tutorials” for S20): https://www.youtube.com/channel/UCHqrRl12d0WtIyS-sECwkRQ/playlists
The last few lectures and tutorials are on probability definitions as seen from the perspective of discrete math and set theory.
A Good article summarizing how likelihood, loss functions, risk, KL divergence, MLE, MAP are all connected.
https://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/
From the course website for a previous year. Some of this we won’t need so much but they are all useful to know for Machine Learning methods in general.
https://compthinking.github.io/RLCourseNotes/
- Part 1 - Live Lecture May 17, 2021 on Virtual Classroom - View Live Here
Parts:
There will be given as a Live Lecture on June 14, 2021 during the 4pm-5:30pm ET Live Session.
According to my youtube analytics, very few people have watched the first two lectures on Temporal Difference Learning or Monte Carlo. But there were a fair number looking at SARSA and QLearning (probably because they are the most famous, fair enough).
This plot is views per video. I removed the even higher video from the first three weeks.
The first bar is “Dynamic Programming 1” with 76 views. (as of June 11, 2021 5:27pm ET)
I was planning to record a new video (that isn’t on youtube yet) on the following during the live session on Monday:
But… if lots of people show up, we could also:
So let me know here what topic you would want to go over, or redo live:
I’ll check this post on Sunday/Monday and see which option it will be.
Eligibility traces, in a tabular setting, lead to a significant benefit in training time in additional to the Temporal Difference method.
In Deep RL it is very common to use experience replay to reduce overfitting and bias to recent experiences. However, experience replay makes it very hard to leverage eligibility traces which require a sequence of actions to distribute reward backwards.
A Value Function Approximation (VFA)
is a necessary technique to use whenever the size of the state of action spaces become too large to represent the value function explicitly as a table. In practice, any practical problem needs to use a VFA.
Some of the posts used for lecture on July 26.
Very clear blog post on describing Actor-Critic Algorithms to improve Policy Gradients
https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/
Going beyond what we covered in class, here are some exciting trends and new advances in RL research in the past few years to find out more about.
PG methods are a fast changing area of RL research. This post has a number of the successful algorithms in this area from a few years ago:
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#actor-critic
Get Timepoint
: Jump straight to the part of the Alpha Go Documentary where they explain the learning process Alpha Go uses. It also is the start of the first moment where the program does a creative move that humans did not expect.[Ermon2019] - First half of notes are based on Stanford CS 228 (https://ermongroup.github.io/cs228-notes/) which goes even more into details on PGMs than we will.
[Cam Davidson 2018] - Bayesian Methods for Hackers - Probabilistic Programming textbook as set of python notebooks.
https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/#contents
[Koller, Friedman, 2009] Probabilistic Graphical Models : Principles and Techniques
The extensive theoretical book on PGMs.
https://mitpress.mit.edu/books/probabilistic-graphical-models
When using a VFA, you can use a Stochastic Gradient Descent (SGD) method to search for the best weights for your value function according to experience.
This parametric form the value function will then be used to obtain a greedy or epsilon-greedy policy at run-time.
This is why using a VFA + SGD is still different from a Direct Policy Search approach where you optimize the parameters of the policy directly.
SamIam Bayesian Network GUI Tool
Other Tools
Some videos and resources on Bayes Nets, d-seperation, Bayes Ball Algorithm and more:
https://metacademy.org/graphs/concepts/bayes_ball