You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for sharing your work. I was going through issue #3 , can you explain why taking the mean of channel dimension (e.g 768 dim) of the output from transformer will give us attention heat map? Am I missing something here?
When I do q@qT , rather than q.mean(), it gives me a bit different heat map.
Also, can you share the code you used for the visualization?
Thanks
The text was updated successfully, but these errors were encountered:
Hi, thank you for sharing your work. I was going through issue #3 , can you explain why taking the mean of channel dimension (e.g 768 dim) of the output from transformer will give us attention heat map? Am I missing something here?
When I do q@qT , rather than q.mean(), it gives me a bit different heat map.
Also, can you share the code you used for the visualization?
Thanks
The text was updated successfully, but these errors were encountered: