New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Is this codes really based on what the paper said? #36

Open

basbaba opened this issue Jul 3, 2023 · 0 comments

basbaba commented Jul 3, 2023

We are studying TIIM and found many problems, the most critical confusions are:

This codes employs model/transformer/Transformer but not the model/transformer/TransformerMonotonic, which should be the main ideal of using MoCha mentioned in the paper.
In TransformerMonotonic, the image features are organized in HxNWxC, that means features are scanned in the order of row by row, not column by column, but the paper puts emphasis on column and explained why vertical features do better to the translating.

Did I misunderstand the whole thing in paper and codes? Please correct me if I'm wrong.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment