Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training_data_fraction is slow. #180

Closed
sleepinyourhat opened this issue Jul 17, 2018 · 6 comments
Closed

training_data_fraction is slow. #180

sleepinyourhat opened this issue Jul 17, 2018 · 6 comments
Labels
jiant-v1-legacy Relevant to versions <= v1.3.2 low-priority Only if you're bored. Ask Sam/Ian/Alex before starting.

Comments

@sleepinyourhat
Copy link
Contributor

If you're using a small fraction (10% or less), you spend most of your time hashing examples, GPU usage drops, and samples per second drops dramatically. The lazy solution for now is to use a cumbersome two different setups: training_data_fraction when training on >1% of the training data, and a custom data file and new task when training on less.

@iftenney
Copy link
Collaborator

Can we pre-generate the hashes during indexing for tasks where we want to do this?

@sleepinyourhat
Copy link
Contributor Author

That might be the cleanest solution, yes.

@iftenney
Copy link
Collaborator

Another solution is sharded training files - @pappagari is working on this for Reddit I believe. For these experiments we could generate, say, 10 files and use only some of them.

@sleepinyourhat
Copy link
Contributor Author

Relevant to @Jan21

@sleepinyourhat
Copy link
Contributor Author

Shard-based solution sounds more efficient, but probably messier/less reproducible. May be wrong, though.

@sleepinyourhat sleepinyourhat added the low-priority Only if you're bored. Ask Sam/Ian/Alex before starting. label Jul 27, 2018
@jeswan jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020
@zphang
Copy link
Collaborator

zphang commented Oct 16, 2020

This is an automatically generated comment.

As we update jiant to v2.x, jiant v1.x has been migrated to https://github.com/nyu-mll/jiant-v1-legacy. As such, we are closing all issues relating to jiant v1.x in this repository.

If this issue is still affecting you in jiant v1.x, please follow up at nyu-mll/jiant-v1-legacy#180.

If this issue is still affecting you in jiant v2.x, reopen this issue or create a new one.

@zphang zphang closed this as completed Oct 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jiant-v1-legacy Relevant to versions <= v1.3.2 low-priority Only if you're bored. Ask Sam/Ian/Alex before starting.
Projects
None yet
Development

No branches or pull requests

4 participants