Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Course References, Links and Random Notes

Probabilistic Reasoning and Reinforcement Learning

ECE 493 Technical Electives - Topic 25

Course Resources

Primary References for Course

Primary References for Probabilistic Reasoning

[Ermon2019] - First half of notes are based on Stanford CS 228 (https://ermongroup.github.io/cs228-notes/) which goes even more into details on PGMs than we will.

[Cam Davidson 2018] - Bayesian Methods for Hackers - Probabilistic Programming textbook as set of python notebooks.
https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/#contents

[Koller, Friedman, 2009] Probabilistic Graphical Models : Principles and Techniques
The extensive theoretical book on PGMs.
https://mitpress.mit.edu/books/probabilistic-graphical-models

Primary References for Decision Making Under Uncertainty

[Dimitrakakis2019] - Decision Making Under Uncertainty and Reinforcement Learning

http://www.cse.chalmers.se/~chrdimi/downloads/book.pdf

[Ghavamzadeh2016] - Bayesian Reinforcement Learning: A Survey. Ghavamzadeh et al. 2016.
https://arxiv.org/abs/1609.04436

[SuttonBarto2018] - Reinforcement Learning: An Introduction. Book, free pdf of draft available.
http://incompleteideas.net/book/the-book-2nd.html

Other Useful Resources

Reinforcement Learning Tutorial with Demo on GitHub

This is a thorough collection of slides from a few different texts and courses laid out with the essentials from basic decision making to Deep RL. There is also code examples for some of their own simple domains.
https://github.com/omerbsezer/Reinforcement_learning_tutorial_with_demo#ExperienceReplay

Deep Q Network vs Policy Gradients - An Experiment on VizDoom with Keras

A nice blog post on comparing DQN and Policy Gradient algorithms such A2C.
https://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html

Topics

Alpha Go

Alpha Go Documentary

https://youtu.be/jGyCsVhtW0M

Timepoint: Jump straight to the part of the Alpha Go Documentary where they explain the learning process Alpha Go uses. It also is the start of the first moment where the program does a creative move that humans did not expect.
https://youtu.be/jGyCsVhtW0M?t=2834

Analysis of What Alpha Go was “thinking” when it played Sedol Lee
https://www.wired.com/2016/03/googles-ai-viewed-move-no-human-understand/

Bayes Nets

Tools

SamIam Bayesian Network GUI Tool

Other Tools

References

Some videos and resources on Bayes Nets, d-seperation, Bayes Ball Algorithm and more:
https://metacademy.org/graphs/concepts/bayes_ball

Likelihood, Loss and Risk

A Good article summarizing how likelihood, loss functions, risk, KL divergence, MLE, MAP are all connected.
https://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/

Conjugate Priors

Multiarmed Bandits

Multiarmed Bandit : Solving it via Reinforcement Learning in Python

Thompson Sampling

Markov Decision Processes

Domains

Eligibility Traces

Eligibility traces in tabular setting lead to a significant benefit in training time in additional to the Temporal Difference method.

In Deep RL it is very common to use experience replay to reduce overfitting and bias to recent experiences. However, experience replay makes it very hard to leverage eligibility traces which require a sequence of actions to distribute reward backwards.

Value Function Approximation

How to use a shallow, linear approximation for Atari

https://www.amii.ca/the-success-of-dqn-explained-by-shallow-reinforcement-learning/
This post explains a paper showing how to achieve the same performance as the Deep RL DQN method for Atari using carefully constructed linear value function approximation.

  • Policy Gradients
  • Actor-Critic

Policy Gradient Algorithms

Some of the posts used for lecture on July 26.

Actor-Critic Algorithm

Very clear blog post on describing Actor-Critic Algorithms to improve Policy Gradients
https://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/

Cutting Edge Algorithms

Going beyond what we covered in class, here are some exciting trends and new advances in RL research in the past few years to find out more about.

Policy Gradient Methods

As I said in class, PG methods are a fast changing area of RL research. This post has a number of the successful algorithms in this area as of a year ago:
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#actor-critic