Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix coca training #710

Closed
wants to merge 1 commit into from
Closed

Conversation

gpucce
Copy link
Contributor

@gpucce gpucce commented Oct 26, 2023

@rwightman I was too hasty with the embed_cls thing, I think now the model would not train.
This should make it work with the tokenizer and train ok, fix #715

@rom1504
Copy link
Collaborator

rom1504 commented Oct 28, 2023 via email

@gpucce
Copy link
Contributor Author

gpucce commented Oct 28, 2023

I think coca is in the training tests already, but this does not error. The tokens are not shifted by one, so it learns to copy current token instead of predict the next one

@rwightman
Copy link
Collaborator

So, been thinking about this one, I really don't like the is_training, it's not done this way elsewhere. The label shift is standard, but why do we need to truncate the text encoder output like that only for training?

@gpucce
Copy link
Contributor Author

gpucce commented Nov 1, 2023

So, been thinking about this one, I really don't like the is_training, it's not done this way elsewhere. The label shift is standard, but why do we need to truncate the text encoder output like that only for training?

@rwightman

After shortening the labels one needs to drop the last token in the encoder otherwise there will be a length mismatch and also the last token does have a next_one to use as a label. However in generation one wants to keep also the last token otherwise it does not go forward.

One could also do it something like this self.encode_text(text[:, :-1])

About is_training maybe could also go back to embed_cls though maybe that wasn't the best either

@rwightman
Copy link
Collaborator

Yeah I don't like embed_cls either. Truncating the text input first, outside of the forward ala self.encode_text(text[:, :-1]) is the 'normal' approach, but wasn't sure if that would impact the contrastive latent?

@gpucce
Copy link
Contributor Author

gpucce commented Nov 1, 2023

The reason for this is that it was meant to keep it identical to how it was before (assuming I did it right) and since compared to before the tokenizer has a hidden text[:, :-1] this way the change would not show in the contrastive latent but be there in the generative logits.

However, it probably makes very little difference and the 'normal' way is better.

@rwightman
Copy link
Collaborator

merged through #877 with minor changes

@rwightman rwightman closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coca training doesn't work
3 participants