I thought I knew what “attention” was, so I looked around and found out that everyone is very sure they understand it and that the latest version they are using is the only one. But the idea of looking at the attention, or focus, or “internal thinking”, of a DNN to understand, or improve, its learning has been around for a while.
This is a great article overviewing many of these topics that guided me to many of these articles:
http://proceedings.mlr.press/v37/xuc15.pdf
Using an attention calculation on an already trained CNN to indicate to the user what features the CNN is using to make its classification. It also ties these attention scores to descriptive word labels.
Grad-CAM and other algorithms created cool images shows us what CNNs are “thinking”.
But they are computed after the CNN is learned, so we don’t really know how it relates to the learning process.
Also, some People have pointed out that even these methods can highlight things that aren’t really there:
And the text that follows is normal, whatever that means.
Paragraphs are no different.
The text under it is normal
The text under it is normal
I don’t remember what it is. It’s just smaller.
That’s like magic. It even applies to the text that follows.
(psst! H5 is invisible ^^^) But the question is…does it apply to text the next paragraph down? The answer is no
Who knows? I know, it doesn't
What does it do?
No one knows.
So it says.
or does it?