Sign up for free to use this document yourself.
  • What is Attention Anyways?

    I thought I knew what “attention” was, so I looked around and found out that everyone is very sure they understand it and that the latest version they are using is the only one. But the idea of looking at the attention, or focus, or “internal thinking”, of a DNN to understand, or improve, its learning has been around for a while.

  • Reading that Led me to These Articles

  • New “Age” of Attention with Transformers

  • Learning with Attention for Images Classification and Labelling

  • The Original Use of Attention in CNNs

  • Style Sheet (leave this alone)
    Hidden Gingko Style Sheet Code
  • Torr, ICLR, 2018 - Learn to Pay Attention

    https://arxiv.org/pdf/1804.02391.pdf

  • Bengio, ICML, 2015 - Show, Attend and Tell

    http://proceedings.mlr.press/v37/xuc15.pdf

    Using an attention calculation on an already trained CNN to indicate to the user what features the CNN is using to make its classification. It also ties these attention scores to descriptive word labels.

  • Using Attention to make CNNs Interpretable was all the craze for a while

    Grad-CAM and other algorithms created cool images shows us what CNNs are “thinking”.

  • But they are computed after the CNN is learned, so we don’t really know how it relates to the learning process.

  • Also, some People have pointed out that even these methods can highlight things that aren’t really there:

  • H1 Are Centred

  • H2 Have A Strong Line

    And the text that follows is normal, whatever that means.

    Paragraphs are no different.

  • H3 Has A Dashed Line

    The text under it is normal

  • H3 Has A Dashed Line … even if there are multiple ones

    The text under it is normal

  • H4 Does Something Different

    I don’t remember what it is. It’s just smaller.

  • H5 Is Invisible

    That’s like magic. It even applies to the text that follows.

    (psst! H5 is invisible ^^^) But the question is…does it apply to text the next paragraph down? The answer is no

    Or the next section down?

    Who knows? I know, it doesn't

  • H6 Even Exists

    What does it do?

    No one knows.

  • # H7 Does not Exist

    So it says.

    or does it?

  • ## H8+ Does not Exist

    So it says.

    or does it?

{"cards":[{"_id":"29c32381db00c19a6a0001b6","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964382,"position":0.125,"parentId":null,"content":"# What is Attention Anyways?\nI thought I knew what \"attention\" was, so I looked around and found out that everyone is very sure they understand it and that the latest version they are using is the only one. But the idea of looking at the attention, or focus, or \"internal thinking\", of a DNN to understand, or improve, its learning has been around for a while."},{"_id":"29be1689db00c19a6a0001d5","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965178,"position":1,"parentId":"29c32381db00c19a6a0001b6","content":"## Sidebar: Links to these Notes\n- original full gingko tree: https://gingkoapp.com/lecture-what-is-attention-anyways\n- html version: https://gingkoapp.com/lecture-what-is-attention-anyways.html\n- impress presentation (in browser) version - https://gingkoapp.com/lecture-what-is-attention-anyways.impress#/step-1\n- plain text markdown version - https://gingkoapp.com/lecture-what-is-attention-anyways.txt"},{"_id":"29c33df5db00c19a6a0001b0","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964378,"position":0.25,"parentId":null,"content":"## Reading that Led me to These Articles"},{"_id":"29c33dd5db00c19a6a0001b1","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965179,"position":1,"parentId":"29c33df5db00c19a6a0001b0","content":"This is a great article overviewing many of these topics that guided me to many of these articles:\n- **Learn to Pay Attention! Trainable Visual Attention in CNNs** - Rachel Draelos - \nhttps://towardsdatascience.com/learn-to-pay-attention-trainable-visual-attention-in-cnns-87e2869f89f1 (Aug 10, 2019)\n"},{"_id":"29c33f36db00c19a6a0001af","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964364,"position":0.5,"parentId":null,"content":"## New \"Age\" of Attention with Transformers"},{"_id":"29c33077db00c19a6a0001b2","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964369,"position":1,"parentId":"29c33f36db00c19a6a0001af","content":"- **Attention is All you Need** - Vaswani, NeurIPS 2017 \n - arxiv paper - https://arxiv.org/pdf/1706.03762.pdf\n - discuss on hypothesis - https://hyp.is/go?url=https%3A%2F%2Farxiv.org%2Fpdf%2F1706.03762.pdf&group=xrxZM1b3"},{"_id":"29c3301edb00c19a6a0001b3","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964374,"position":0.75,"parentId":null,"content":"## Learning with Attention for Images Classification and Labelling"},{"_id":"29c323b7db00c19a6a0001b5","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964376,"position":1,"parentId":"29c3301edb00c19a6a0001b3","content":"- **Data Efficient and Weakly Supervised Computational Pathology\non Whole Slide Images** - Lu, Nature Biomedical Engineering, 2021 - \n - arxiv paper - https://arxiv.org/pdf/2004.09666.pdf\n - discuss on hypothesis - https://hyp.is/go?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2004.09666.pdf&group=xrxZM1b3"},{"_id":"29c3bd32db00c19a6a0001aa","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965048,"position":1,"parentId":null,"content":"## The Original Use of Attention in CNNs\n"},{"_id":"29c35b16db00c19a6a0001ac","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964349,"position":0.5,"parentId":"29c3bd32db00c19a6a0001aa","content":"### Torr, ICLR, 2018 - Learn to Pay Attention\nhttps://arxiv.org/pdf/1804.02391.pdf"},{"_id":"29c35f6fdb00c19a6a0001ab","treeId":"29c3cd1adb00c19a6a0001a7","seq":22964353,"position":1,"parentId":"29c3bd32db00c19a6a0001aa","content":"### Bengio, ICML, 2015 - Show, Attend and Tell\nhttp://proceedings.mlr.press/v37/xuc15.pdf\n\nUsing an attention calculation on an already trained CNN to indicate to the user what features the CNN is using to make its classification. It also ties these attention scores to descriptive word labels."},{"_id":"29c34abedb00c19a6a0001ad","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965052,"position":2,"parentId":"29c3bd32db00c19a6a0001aa","content":"### Using Attention to make CNNs Interpretable was all the craze for a while\nGrad-CAM and other algorithms created cool images shows us what CNNs are \"thinking\"."},{"_id":"29be2f34db00c19a6a0001b8","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965053,"position":3,"parentId":"29c3bd32db00c19a6a0001aa","content":"But they are computed after the CNN is learned, so we don't really know how it relates to the learning process."},{"_id":"29be2f19db00c19a6a0001b9","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965058,"position":4,"parentId":"29c3bd32db00c19a6a0001aa","content":"Also, some People have pointed out that even these methods can highlight things that aren't really there:\n- **Use HiResCAM instead of Grad-CAM for faithful explanations of convolutional neural networks** - Rachel Lea Draelos, Lawrence Carin - https://arxiv.org/abs/2011.08891\n- **CNN Heat Maps: Class Activation Mapping (CAM)** - Rachel Draelos - https://glassboxmedicine.com/2019/06/11/cnn-heat-maps-class-activation-mapping-cam/ (June 11, 2019)\n"},{"_id":"6239c91a410c0a03aab33f44","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965111,"position":2,"parentId":null,"content":"\n###### Style Sheet (leave this alone)\n##### Hidden Gingko Style Sheet Code\n- *don't mess with it*\n- Style Sheet - see saved main version : https://gingkoapp.com/app#3bf9513db6a011c9e8000239\n\n<style>\nh1 {\n font-size: 3em;\n color: #d55;\n text-align: center;\n border-bottom: 5px solid #eee;\n}\n\nh2 {\n font-size: 2em;\n color: #a55;\n border-bottom: 2px solid #999;\n\n}\n\nh3 {\n font-size: 1.75em;\n color: #722;\n border-bottom: 1px dashed #999;\n\n}\n\nh4 {\n font-size: 1.25em;\n border-bottom: 1px dashed #aaa;\n}\nh5 {\n font-size: 1.25em;\n border-bottom: 1px dashed #aaa;\n}\nh5, h5 + * {\n display: none;\n}\n\nh6, h6 + * {\n font-size: 1em;\n color: #77b;\n}\n\n.fullscreen-overlay .fullscreen-container {\n max-width: 2500px;\n \n height:100%;\n margin:1 auto;\n padding:40px 0;\n}\n\n#migration-notice {\n display: none;\nvisible: false;\n}\n</style>\n\n"},{"_id":"6239c91a410c0a03aab33f45","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965102,"position":1,"parentId":"6239c91a410c0a03aab33f44","content":"# H1 Are Centred"},{"_id":"6239c91a410c0a03aab33f46","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965103,"position":2,"parentId":"6239c91a410c0a03aab33f44","content":"## H2 Have A Strong Line\nAnd the text that follows is **normal**, whatever *that* means.\n\nParagraphs are no different."},{"_id":"6239c91a410c0a03aab33f47","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965104,"position":3,"parentId":"6239c91a410c0a03aab33f44","content":"### H3 Has A Dashed Line\nThe text under it is normal"},{"_id":"6239c91a410c0a03aab33f48","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965105,"position":4,"parentId":"6239c91a410c0a03aab33f44","content":"### H3 Has A Dashed Line ... even if there are multiple ones\nThe text under it is normal"},{"_id":"6239c91a410c0a03aab33f49","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965106,"position":5,"parentId":"6239c91a410c0a03aab33f44","content":"#### H4 Does Something Different\nI don't remember what it is. *It's just smaller.*"},{"_id":"6239c91a410c0a03aab33f4a","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965107,"position":6,"parentId":"6239c91a410c0a03aab33f44","content":"##### H5 Is *Invisible*\nThat's like magic. It even applies to the text that follows.\n\n*(psst! H5 is invisible ^^^)* But the question is...does it apply to text the next paragraph down? `The answer is no`\n\n#### Or the next section down?\nWho knows? `I know, it doesn't`\n"},{"_id":"6239c91a410c0a03aab33f4b","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965108,"position":7,"parentId":"6239c91a410c0a03aab33f44","content":"###### H6 Even Exists\nWhat does it do?\n\nNo one knows."},{"_id":"6239c91a410c0a03aab33f4c","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965109,"position":8,"parentId":"6239c91a410c0a03aab33f44","content":"####### H7 Does not Exist\n\nSo it says.\n\nor does it?"},{"_id":"6239c91a410c0a03aab33f4d","treeId":"29c3cd1adb00c19a6a0001a7","seq":22965110,"position":9,"parentId":"6239c91a410c0a03aab33f44","content":"######## H8+ Does not Exist\n\nSo it says.\n\nor does it?"}],"tree":{"_id":"29c3cd1adb00c19a6a0001a7","name":"Attention Lecture (Draft)","publicUrl":"lecture-what-is-attention-anyways"}}