This is a thorough collection of slides from a few different texts and courses laid out with the essentials from basic decision making to Deep RL. There is also code examples for some of their own simple domains.
https://github.com/omerbsezer/Reinforcement_learning_tutorial_with_demo#ExperienceReplay
A nice blog post on comparing DQN and Policy Gradient algorithms such A2C.
https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html
[Ermon2019] - First half of notes are based on Stanford CS 228 (https://ermongroup.github.io/cs228-notes/) which goes even more into details on PGMs than we will.
[Cam Davidson 2018] - Bayesian Methods for Hackers - Probabilistic Programming textbook as set of python notebooks.
https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/#contents
[Koller, Friedman, 2009] Probabilistic Graphical Models : Principles and Techniques
The extensive theoretical book on PGMs.
https://mitpress.mit.edu/books/probabilistic-graphical-models
[Dimitrakakis2019] - Decision Making Under Uncertainty and Reinforcement Learning
[Ghavamzadeh2016] - Bayesian Reinforcement Learning: A Survey. Ghavamzadeh et al. 2016.
https://arxiv.org/abs/1609.04436
[SuttonBarto2018] - Reinforcement Learning: An Introduction. Book, free pdf of draft available.
http://incompleteideas.net/book/the-book-2nd.html
Timepoint: Jump straight to the part of the Alpha Go Documentary where they explain the learning process Alpha Go uses. It also is the start of the first moment where the program does a creative move that humans did not expect.
https://youtu.be/jGyCsVhtW0M?t=2834
Analysis of What Alpha Go was “thinking” when it played Sedol Lee
https://www.wired.com/2016/03/googles-ai-viewed-move-no-human-understand/
A Good article summarizing how likelihood, loss functions, risk, KL divergence, MLE, MAP are all connected.
https://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/
Eligibility traces in tabular setting lead to a significant benefit in training time in additional to the Temporal Difference method.
In Deep RL it is very common to use experience replay to reduce overfitting and bias to recent experiences. However, experience replay makes it very hard to leverage eligibility traces which require a sequence of actions to distribute reward backwards.
https://www.amii.ca/the-success-of-dqn-explained-by-shallow-reinforcement-learning/
This post explains a paper showing how to achieve the same performance as the Deep RL DQN method for Atari using carefully constructed linear value function approximation.
Some of the posts used for lecture on July 26.
Very clear blog post on describing Actor-Critic Algorithms to improve Policy Gradients
https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/
Going beyond what we covered in class, here are some exciting trends and new advances in RL research in the past few years to find out more about.
SamIam Bayesian Network GUI Tool
Other Tools
Some videos and resources on Bayes Nets, d-seperation, Bayes Ball Algorithm and more:
https://metacademy.org/graphs/concepts/bayes_ball
As I said in class, PG methods are a fast changing area of RL research. This post has a number of the successful algorithms in this area as of a year ago:
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#actor-critic