[D] VIT16 – Should I use all or only final attention MHA to generate attention heatmap?
| |
Hello, I’m currently extracting attention heatmaps from pretrained ViT16 models (which i then finetune) to see what regions of the image did the model use to make its prediction. Many research papers and sources suggests that I should only extract attention scores from final layer, but based on my experiments so far taking the average of MHA scores actually gave a “better” heatmap than just the final layer (image attached). Additionally, I am a bit confused as to why there are consistent attentions to the image paddings (black border). The two methods gives very different results, and I’m not sure if I should trust the attention heatmap. submitted by /u/PositiveInformal9512 |
Like
0
Liked
Liked