• Primary References for Probabilistic Reasoning

  • Primary References for Decision Making Under Uncertainty

  • Alpha Go

  • Bayes Nets

  • Likelihood, Loss and Risk

  • Conjugate Priors

  • Multiarmed Bandits

  • Markov Decision Processes

  • Eligibility Traces

  • Value Function Approximation

    • Policy Gradients
    • Actor-Critic
  • Tools

  • References

  • Domains

{"cards":[{"_id":"5fe4d175553d829d700002aa","treeId":"5fe4d3bb553d829d700002a7","seq":18132382,"position":1,"parentId":null,"content":"# Course References, Links and Random Notes \n### Probabilistic Reasoning and Reinforcement Learning\n### ECE 493 Technical Electives - Topic 25\n\n"},{"_id":"5fadc422b1ba66c81c000052","treeId":"5fe4d3bb553d829d700002a7","seq":18132383,"position":1.875,"parentId":null,"content":"# Course Resources\n- Notes - https://rateldajer.github.io/ECE493T25S19/\n- Slides - see LEARN\n- This list of links:\n - As a tree: https://gingkoapp.com/UWECE657C\n - As plain HTML: https://gingkoapp.com/UWECE657C.html"},{"_id":"5fae8e69b1ba66c81c000050","treeId":"5fe4d3bb553d829d700002a7","seq":17690070,"position":2.75,"parentId":null,"content":"# Primary References for Course"},{"_id":"5fdbb2c74031e94856000047","treeId":"5fe4d3bb553d829d700002a7","seq":17690073,"position":1,"parentId":"5fae8e69b1ba66c81c000050","content":"## Primary References for Probabilistic Reasoning"},{"_id":"5fdbb2814031e94856000048","treeId":"5fe4d3bb553d829d700002a7","seq":17690088,"position":1,"parentId":"5fdbb2c74031e94856000047","content":"**[Ermon2019]** - First half of notes are based on Stanford CS 228 (https://ermongroup.github.io/cs228-notes/) which goes even more into details on PGMs than we will.\n"},{"_id":"5fdbafec4031e9485600004b","treeId":"5fe4d3bb553d829d700002a7","seq":17690057,"position":1.25,"parentId":"5fdbb2c74031e94856000047","content":"**[Cam Davidson 2018]** - Bayesian Methods for Hackers - Probabilistic Programming textbook as set of python notebooks.\nhttps://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/#contents"},{"_id":"5f5269c8956a4cdb8d000054","treeId":"5fe4d3bb553d829d700002a7","seq":17786788,"position":3,"parentId":"5fdbb2c74031e94856000047","content":"**[Koller, Friedman, 2009]** Probabilistic Graphical Models : Principles and Techniques \nThe extensive theoretical book on PGMs.\nhttps://mitpress.mit.edu/books/probabilistic-graphical-models"},{"_id":"5fae95fcb1ba66c81c00004d","treeId":"5fe4d3bb553d829d700002a7","seq":17690074,"position":2,"parentId":"5fae8e69b1ba66c81c000050","content":"## Primary References for Decision Making Under Uncertainty"},{"_id":"5fdbb1a74031e9485600004a","treeId":"5fe4d3bb553d829d700002a7","seq":17690062,"position":1,"parentId":"5fae95fcb1ba66c81c00004d","content":"**[Dimitrakakis2019]** - Decision Making Under Uncertainty and Reinforcement Learning\n\nhttp://www.cse.chalmers.se/~chrdimi/downloads/book.pdf"},{"_id":"5fdbb1e14031e94856000049","treeId":"5fe4d3bb553d829d700002a7","seq":17690066,"position":2,"parentId":"5fae95fcb1ba66c81c00004d","content":"**[Ghavamzadeh2016]** - Bayesian Reinforcement Learning: A Survey. Ghavamzadeh et al. 2016.\nhttps://arxiv.org/abs/1609.04436"},{"_id":"5fae93a0b1ba66c81c00004e","treeId":"5fe4d3bb553d829d700002a7","seq":17690067,"position":3,"parentId":"5fae95fcb1ba66c81c00004d","content":"**[SuttonBarto2018]** - Reinforcement Learning: An Introduction. Book, free pdf of draft available.\nhttp://incompleteideas.net/book/the-book-2nd.html"},{"_id":"5c32fe007484cb6591000119","treeId":"5fe4d3bb553d829d700002a7","seq":18274559,"position":3.375,"parentId":null,"content":"# Other Useful Resources"},{"_id":"5c32fd867484cb659100011a","treeId":"5fe4d3bb553d829d700002a7","seq":18274560,"position":1,"parentId":"5c32fe007484cb6591000119","content":"## Reinforcement Learning Tutorial with Demo on GitHub\nThis is a thorough collection of slides from a few different texts and courses laid out with the essentials from basic decision making to Deep RL. There is also code examples for some of their own simple domains.\nhttps://github.com/omerbsezer/Reinforcement_learning_tutorial_with_demo#ExperienceReplay"},{"_id":"5c261b33456eb77c410000eb","treeId":"5fe4d3bb553d829d700002a7","seq":18278594,"position":2,"parentId":"5c32fe007484cb6591000119","content":"## Deep Q Network vs Policy Gradients - An Experiment on VizDoom with Keras\nA nice blog post on comparing DQN and Policy Gradient algorithms such A2C.\nhttps://flyyufelix.github.io/2017/10/12/dqn-vs-pg.html"},{"_id":"5fae8ea6b1ba66c81c00004f","treeId":"5fe4d3bb553d829d700002a7","seq":17690079,"position":4,"parentId":null,"content":"# Topics"},{"_id":"5fe4cec1553d829d700002ab","treeId":"5fe4d3bb553d829d700002a7","seq":17690076,"position":1,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Alpha Go"},{"_id":"5fddb2b3553d829d700002ad","treeId":"5fe4d3bb553d829d700002a7","seq":17661877,"position":0.5,"parentId":"5fe4cec1553d829d700002ab","content":"### Alpha Go Documentary\nhttps://youtu.be/jGyCsVhtW0M\n"},{"_id":"5fddb43c553d829d700002ac","treeId":"5fe4d3bb553d829d700002a7","seq":17661907,"position":1,"parentId":"5fe4cec1553d829d700002ab","content":"Timepoint: Jump straight to the part of the Alpha Go Documentary where they explain the learning process Alpha Go uses. It also is the start of the first moment where the program does a creative move that humans did not expect.\nhttps://youtu.be/jGyCsVhtW0M?t=2834"},{"_id":"5fdc5274553d829d700002ae","treeId":"5fe4d3bb553d829d700002a7","seq":17663437,"position":2,"parentId":"5fe4cec1553d829d700002ab","content":"Analysis of What Alpha Go was \"thinking\" when it played Sedol Lee\nhttps://www.wired.com/2016/03/googles-ai-viewed-move-no-human-understand/"},{"_id":"5fdbfef14031e94856000041","treeId":"5fe4d3bb553d829d700002a7","seq":17690078,"position":2,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Bayes Nets"},{"_id":"5fdbfe8e4031e94856000043","treeId":"5fe4d3bb553d829d700002a7","seq":17757718,"position":0.5,"parentId":"5fdbfef14031e94856000041","content":"### Tools"},{"_id":"5f526d57956a4cdb8d000053","treeId":"5fe4d3bb553d829d700002a7","seq":17789849,"position":0.5,"parentId":"5fdbfe8e4031e94856000043","content":"**SamIam Bayesian Network GUI Tool**\n- Java GUI tool for playing with BNs (its old but its good)\nhttp://reasoning.cs.ucla.edu/samiam/index.php?h=emodels\n\n**Other Tools**\n- Bayesian Belief Networks Python Package :\nAllows creation of Bayesian Belief Networks\nand other Graphical Models with pure Python\nfunctions. Where tractable exact inference\nis used.\nhttps://github.com/eBay/bayesian-belief-networks\n- Python library for conjugate exponential family BNs and variational inference only\nhttp://www.bayespy.org/intro.html\n- Open Markov\nhttp://www.openmarkov.org/\n- Open GM (C++ library)\nhttp://hciweb2.iwr.uni-heidelberg.de/opengm/"},{"_id":"5f7efed8e052795359000051","treeId":"5fe4d3bb553d829d700002a7","seq":17757725,"position":2,"parentId":"5fdbfef14031e94856000041","content":"### References\n"},{"_id":"5f7efe69e052795359000052","treeId":"5fe4d3bb553d829d700002a7","seq":17757726,"position":1,"parentId":"5f7efed8e052795359000051","content":"Some videos and resources on Bayes Nets, d-seperation, Bayes Ball Algorithm and more:\nhttps://metacademy.org/graphs/concepts/bayes_ball"},{"_id":"5e77880ec94dab71ec000089","treeId":"5fe4d3bb553d829d700002a7","seq":17869139,"position":3,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Likelihood, Loss and Risk"},{"_id":"5e77878ac94dab71ec00008a","treeId":"5fe4d3bb553d829d700002a7","seq":17869147,"position":1,"parentId":"5e77880ec94dab71ec000089","content":"A Good article summarizing how likelihood, loss functions, risk, KL divergence, MLE, MAP are all connected.\nhttps://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/"},{"_id":"5e6a1a13a97a1a2ecb0002c6","treeId":"5fe4d3bb553d829d700002a7","seq":17873345,"position":4,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Conjugate Priors"},{"_id":"5e6a19c8a97a1a2ecb0002c7","treeId":"5fe4d3bb553d829d700002a7","seq":17873346,"position":1,"parentId":"5e6a1a13a97a1a2ecb0002c6","content":"https://en.wikipedia.org/wiki/Conjugate_prior#Table_of_conjugate_distributions"},{"_id":"5dbdc38da0d422009a0000bd","treeId":"5fe4d3bb553d829d700002a7","seq":17957054,"position":5,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Multiarmed Bandits\n"},{"_id":"5dbab04ba0d422009a0000bf","treeId":"5fe4d3bb553d829d700002a7","seq":17959418,"position":0.5,"parentId":"5dbdc38da0d422009a0000bd","content":"### Multiarmed Bandit : Solving it via Reinforcement Learning in Python\n- Quite a good blog post with all the concepts laid out in simple terms in order https://www.analyticsvidhya.com/blog/2018/09/reinforcement-multi-armed-bandit-scratch-python/"},{"_id":"5dbdc2dea0d422009a0000be","treeId":"5fe4d3bb553d829d700002a7","seq":17957055,"position":1,"parentId":"5dbdc38da0d422009a0000bd","content":"### Thompson Sampling\n- Long tutorial on TS: https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf"},{"_id":"5d7d6b0f8206658a0a0000c1","treeId":"5fe4d3bb553d829d700002a7","seq":17994023,"position":7,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Markov Decision Processes"},{"_id":"5d7d6ab28206658a0a0000c2","treeId":"5fe4d3bb553d829d700002a7","seq":17994026,"position":1,"parentId":"5d7d6b0f8206658a0a0000c1","content":"### Domains"},{"_id":"5d7d6a708206658a0a0000c3","treeId":"5fe4d3bb553d829d700002a7","seq":17994027,"position":1,"parentId":"5d7d6ab28206658a0a0000c2","content":"#### Gridworld\n- https://www.youtube.com/watch?v=xtJAGjY3SWY"},{"_id":"5c8f6a314d10ff3e9c000098","treeId":"5fe4d3bb553d829d700002a7","seq":18171068,"position":8,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Eligibility Traces"},{"_id":"5c8f51374d10ff3e9c00009c","treeId":"5fe4d3bb553d829d700002a7","seq":18171158,"position":0.125,"parentId":"5c8f6a314d10ff3e9c000098","content":"Eligibility traces in tabular setting lead to a significant benefit in training time in additional to the Temporal Difference method. \n\nIn Deep RL it is very common to use **experience replay** to reduce overfitting and bias to recent experiences. However, experience replay makes it very hard to leverage eligibility traces which require a sequence of actions to distribute reward backwards."},{"_id":"5c8f67d74d10ff3e9c00009b","treeId":"5fe4d3bb553d829d700002a7","seq":18171073,"position":0.25,"parentId":"5c8f6a314d10ff3e9c000098","content":"[Discussion about Incompatibility of Eligibility Traces with Experience Replay](https://stats.stackexchange.com/questions/341027/eligibility-traces-vs-experience-replay/341038)"},{"_id":"5c8f68bf4d10ff3e9c00009a","treeId":"5fe4d3bb553d829d700002a7","seq":18171071,"position":0.5,"parentId":"5c8f6a314d10ff3e9c000098","content":"[Efficient Eligibility Traces for Deep Reinforcement Learning - \nBrett Daley, Christopher Amato](https://arxiv.org/abs/1810.09967)"},{"_id":"5c8f69eb4d10ff3e9c000099","treeId":"5fe4d3bb553d829d700002a7","seq":18171072,"position":1,"parentId":"5c8f6a314d10ff3e9c000098","content":"[Investigating Recurrence and Eligibility Traces in Deep Q-Networks -\nJean Harb, Doina Precup](https://arxiv.org/abs/1704.05495)"},{"_id":"5c0cd8c60bd10517f7000071","treeId":"5fe4d3bb553d829d700002a7","seq":18340652,"position":9,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Value Function Approximation"},{"_id":"5c0cd8180bd10517f7000072","treeId":"5fe4d3bb553d829d700002a7","seq":18301019,"position":1,"parentId":"5c0cd8c60bd10517f7000071","content":"### How to use a shallow, linear approximation for Atari\nhttps://www.amii.ca/the-success-of-dqn-explained-by-shallow-reinforcement-learning/\nThis post explains a paper showing how to achieve the same performance as the Deep RL DQN method for Atari using carefully constructed linear value function approximation."},{"_id":"5bad6d2b44bacb1e5f000074","treeId":"5fe4d3bb553d829d700002a7","seq":18340645,"position":10,"parentId":"5fae8ea6b1ba66c81c00004f","content":"## Direct Policy Search\n- Policy Gradients\n- Actor-Critic\n"},{"_id":"5c2619c2456eb77c410000ec","treeId":"5fe4d3bb553d829d700002a7","seq":18340692,"position":2,"parentId":"5bad6d2b44bacb1e5f000074","content":"## Policy Gradient Algorithms\nSome of the posts used for lecture on July 26.\n\n- A good post with all the fundamental math for policy gradients.\nhttps://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#a3c\n- Also a good intro post about Policy gradients vs DQN by great ML blogger Andrej Karpathy (this is the one I showed in class with the Pong example):\nhttp://karpathy.github.io/2016/05/31/rl/\n- The Open-AI page on the PPO algorithm used on their simulator domains of humanoid robots:\nhttps://openai.com/blog/openai-baselines-ppo/\n- Good description of Actor-Critic approach using Sonic the Hedgehog game as example:\nhttps://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/\n- Blog post about how the original Alpha Go solution worked using Policy Gradient RL and Monte-Carlo Tree Search:\nhttps://medium.com/@jonathan_hui/alphago-how-it-works-technically-26ddcc085319"},{"_id":"5c0d1f340bd10517f7000070","treeId":"5fe4d3bb553d829d700002a7","seq":18340691,"position":2.5,"parentId":"5bad6d2b44bacb1e5f000074","content":"## Actor-Critic Algorithm\nVery clear blog post on describing Actor-Critic Algorithms to improve Policy Gradients\nhttps://www.freecodecamp.org/news/an-intro-to-advantage-actor-critic-methods-lets-play-sonic-the-hedgehog-86d6240171d/"},{"_id":"5bad534e44bacb1e5f000075","treeId":"5fe4d3bb553d829d700002a7","seq":18340684,"position":3,"parentId":"5bad6d2b44bacb1e5f000074","content":"## Cutting Edge Algorithms\nGoing beyond what we covered in class, here are some exciting trends and new advances in RL research in the past few years to find out more about."},{"_id":"5bad51e744bacb1e5f000076","treeId":"5fe4d3bb553d829d700002a7","seq":18340687,"position":1,"parentId":"5bad534e44bacb1e5f000075","content":"### Policy Gradient Methods\nAs I said in class, PG methods are a fast changing area of RL research. This post has a number of the successful algorithms in this area as of a year ago:\nhttps://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#actor-critic"}],"tree":{"_id":"5fe4d3bb553d829d700002a7","name":"Probabilistic Reasoning and Reinforcement Learning Links","publicUrl":"uwece657c"}}