-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for low accuracy in Tensorflow training #167
Conversation
Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training. Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we proceed to reset current_index_ to start again at the beginning of the shard. This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of current_index to the size of all the indexes. The fix consists in comparing against the end of the shard instead of the end of the indices. Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would result in a loss in accuracy due to---most likely---redundant reads. Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
@@ -94,6 +94,9 @@ class IndexedFileLoader : public Loader<CPUBackend> { | |||
ReadIndexFile(index_uris); | |||
size_t num_indices = indices_.size(); | |||
current_index_ = num_indices/num_shards_ * shard_id_; | |||
max_index_ = (shard_id_ != num_shards_ - 1) | |||
? num_indices/num_shards_ * (shard_id_ + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just:
max_index_ = num_indices * (shard_id_ + 1) / num_shards_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
6875724
to
0e27633
Compare
Waiting till another run finishes and reports good results and I'll be merging this guy |
I agree that it is a bug (good job finding it :-) ) but I do not agree fully with the fix, because it is not consistent with the rest of readers. You should instead change the reset function to set the current index to 0 |
Another way to fix this issue is to make the different threads wrap around the number of shards and and only initializing them to their corresponding offset. This solution is consistent with other readers. Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
* Fix for low accuracy in Tensorflow training Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training. Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we proceed to reset current_index_ to start again at the beginning of the shard. This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of current_index to the size of all the indexes. The fix consists in enabling the shard to wrap around the indices by making reset set current_index_ to zero. Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would result in a loss in accuracy due to shards unevenly loading the dataset. Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
* Fix for low accuracy in Tensorflow training Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training. Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we proceed to reset current_index_ to start again at the beginning of the shard. This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of current_index to the size of all the indexes. The fix consists in enabling the shard to wrap around the indices by making reset set current_index_ to zero. Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would result in a loss in accuracy due to shards unevenly loading the dataset. Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training. Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we proceed to reset current_index_ to start again at the beginning of the shard.
This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of current_index to the size of all the indexes. The fix consists in comparing against the end of the shard instead of the end of the indices.
Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would result in a loss in accuracy due to---most likely---redundant reads.
Signed-off-by: Pablo Ribalta Lorenzo pribalta@nvidia.com