Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP v1 - deprecated] Add learning of target features #1710

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

francoishernandez
Copy link
Member

@francoishernandez francoishernandez commented Jan 27, 2020

This feature has been asked about quite a lot, and might be useful in several tasks, so here it is!

Done

  • create a Generator class to handle multiple generators
  • create an OnmtBatch class inheriting from torchtext.data.Batch to allow feature shifting
  • create additional criterion for loss computation (we may create others later for other types of features)
  • adapt greedy decoding
  • adapt translation predictions building
  • adapt beam search
  • make time shift optional

TODO

@guillaumekln
Copy link
Contributor

Great! Do I read correctly that you implemented the target shifting as in OpenNMT-lua? I no longer think this was a good idea and I don't think it was proven to be any better than just keeping the features aligned. What do you think?

@francoishernandez
Copy link
Member Author

Do I read correctly that you implemented the target shifting as in OpenNMT-lua?

True

I no longer think this was a good idea and I don't think it was proven to be any better than just keeping the features aligned. What do you think?

This implementation seemed ok and fit to the task, at least at first glance. I'm only beginning to run some 'realistic' tests so I don't have enough feedback for now. Do you have specific results in mind?

I agree that for greedy search the shift should not be necessary. But what would you propose for the beam search case, where the feature would highly depend on the predicted token?

@guillaumekln
Copy link
Contributor

guillaumekln commented Jan 27, 2020

The concern is that, yes, target features do depend on the predicted token but in many cases they also depend on the attended source context. For the later, it seems better to keep the features aligned so that attention values are valid for the predictions of both the word and its features. How important is the context vs the predicted token may depend on the type of features.

I don't have a specific results in mind, this is just something that came to mind recently and I'm raising it here. Of course if you have the time and resources, it could be interesting to compare shift vs. no shift.

@francoishernandez
Copy link
Member Author

That's a valid concern though it's not an easy one to wrap one's head around. The best would be indeed to experiment.

Also, I agree it might depend on the type of features --> I'll add a flag to make the shift optional and facilitate such tests.

@eduamf
Copy link

eduamf commented Apr 20, 2020

Could someone with write access check the solution?

@francoishernandez
Copy link
Member Author

francoishernandez commented Apr 20, 2020

Hey @eduamf, thanks for your interest in this topic.
This PR was voluntarily put on standby. This is basically working but there are some strange behaviors making training way too slow.
Feel free to checkout my branch to test and PR on it if you see some useful additions.

@vince62s vince62s changed the title [WIP] Add learning of target features [WIP v1 - deprecated] Add learning of target features Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants