training_data_fraction is slow. #180

sleepinyourhat · 2018-07-17T22:38:40Z

If you're using a small fraction (10% or less), you spend most of your time hashing examples, GPU usage drops, and samples per second drops dramatically. The lazy solution for now is to use a cumbersome two different setups: training_data_fraction when training on >1% of the training data, and a custom data file and new task when training on less.

iftenney · 2018-07-17T22:43:45Z

Can we pre-generate the hashes during indexing for tasks where we want to do this?

sleepinyourhat · 2018-07-17T22:58:40Z

That might be the cleanest solution, yes.

iftenney · 2018-07-17T23:01:23Z

Another solution is sharded training files - @pappagari is working on this for Reddit I believe. For these experiments we could generate, say, 10 files and use only some of them.

sleepinyourhat · 2018-07-17T23:01:58Z

Relevant to @Jan21

sleepinyourhat · 2018-07-17T23:02:37Z

Shard-based solution sounds more efficient, but probably messier/less reproducible. May be wrong, though.

zphang · 2020-10-16T04:32:08Z

This is an automatically generated comment.

As we update jiant to v2.x, jiant v1.x has been migrated to https://github.com/nyu-mll/jiant-v1-legacy. As such, we are closing all issues relating to jiant v1.x in this repository.

If this issue is still affecting you in jiant v1.x, please follow up at nyu-mll/jiant-v1-legacy#180.

If this issue is still affecting you in jiant v2.x, reopen this issue or create a new one.

sleepinyourhat added the low-priority Only if you're bored. Ask Sam/Ian/Alex before starting. label Jul 27, 2018

HaokunLiu mentioned this issue Apr 16, 2020

Faster data fraction #1069

Closed

jeswan mentioned this issue Sep 17, 2020

training_data_fraction is slow. nyu-mll/jiant-v1-legacy#180

Open

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

zphang closed this as completed Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training_data_fraction is slow. #180

training_data_fraction is slow. #180

sleepinyourhat commented Jul 17, 2018

iftenney commented Jul 17, 2018

sleepinyourhat commented Jul 17, 2018

iftenney commented Jul 17, 2018

sleepinyourhat commented Jul 17, 2018

sleepinyourhat commented Jul 17, 2018

zphang commented Oct 16, 2020

training_data_fraction is slow. #180

training_data_fraction is slow. #180

Comments

sleepinyourhat commented Jul 17, 2018

iftenney commented Jul 17, 2018

sleepinyourhat commented Jul 17, 2018

iftenney commented Jul 17, 2018

sleepinyourhat commented Jul 17, 2018

sleepinyourhat commented Jul 17, 2018

zphang commented Oct 16, 2020