You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How were the visualizations of Figures 4 and 5 made? Taking Figure 4 as an example, how is the missing modality bank with a shape of [$2^{|M|}-1$, num_modality, num_patch, hidden_dim] used to visualize cosine similarity?
Is the difference between G-Router and S-Router only based on whether the proposed loss $L_{ce}$ is used or not? Does G-Router only use $L_{balance}$, while S-Router uses $L_{ce}$ and $L_{balance}$? And their structures are the same, just using different loss functions during different training periods?
Looking forward to your answer !
The text was updated successfully, but these errors were encountered:
Great work !
Looking forward to your answer !
The text was updated successfully, but these errors were encountered: