-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ElasticSampler and PyTorch Elastic ImageNet example #2297
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
a032e10
to
672862b
Compare
Signed-off-by: Travis Addair <taddair@uber.com>
Unit Test Results0 files 0 suites 0s ⏱️ results for commit 672862b |
Signed-off-by: Travis Addair <taddair@uber.com>
Unit Test Results0 files 0 suites 0s ⏱️ results for commit 61582f6 |
Signed-off-by: Travis Addair <taddair@uber.com>
Unit Test Results0 files 0 suites 0s ⏱️ results for commit 313d604 |
Signed-off-by: Travis Addair <taddair@uber.com>
Unit Test Results0 files 0 suites 0s ⏱️ results for commit 4d82fda |
# Training settings | ||
parser = argparse.ArgumentParser(description='Elastic PyTorch ImageNet Example', | ||
formatter_class=argparse.ArgumentDefaultsHelpFormatter) | ||
parser.add_argument('--train-dir', default=os.path.expanduser('~/imagenet/train'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general improvement for future: I think it would be nice to have a script to download and prepare imagenet data for the examples. From personal experience of trying to run example on other repos, it is sometimes painful to get the data the way the example expects it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. At least instructions to do so. Finding and downloading ImageNet is a pain.
self.rank = rank() | ||
|
||
# Exclude any samples we have already processed this epoch | ||
self.remaining_indices = [idx for idx in range(len(self.dataset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: It might be more efficient to do a Set of remaining indices - Set of processed indices instead of iterating over the entire list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we need to preserve the order of remaining_indices
so that it is deterministic across order, which we could not guarantee using a set.
Generally, this looks good to me. So, ElasticSampler works only with datasets that are entirely on the disk and not with packages like Petastorm where we are streaming the data in. Is that correct? |
Any dataset that can be randomly indexed. Parquet is not particularly well-suited to this approach because of its row group storage format, but it could work in theory. Though in practice, we will need to do something else for Petastorm. |
Fixes #2252.