Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for low accuracy in Tensorflow training #167

Merged
merged 3 commits into from
Sep 14, 2018

Conversation

pribalta
Copy link
Contributor

@pribalta pribalta commented Sep 13, 2018

Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training. Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we proceed to reset current_index_ to start again at the beginning of the shard.

This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of current_index to the size of all the indexes. The fix consists in comparing against the end of the shard instead of the end of the indices.

Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would result in a loss in accuracy due to---most likely---redundant reads.

Signed-off-by: Pablo Ribalta Lorenzo pribalta@nvidia.com

Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training.
Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we
proceed to reset current_index_ to start again at the beginning of the shard.

This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of
current_index to the size of all the indexes. The fix consists in comparing against the end of the shard instead
of the end of the indices.

Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would
result in a loss in accuracy due to---most likely---redundant reads.

Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
@pribalta pribalta requested review from ptrendx and JanuszL September 13, 2018 07:51
@@ -94,6 +94,9 @@ class IndexedFileLoader : public Loader<CPUBackend> {
ReadIndexFile(index_uris);
size_t num_indices = indices_.size();
current_index_ = num_indices/num_shards_ * shard_id_;
max_index_ = (shard_id_ != num_shards_ - 1)
? num_indices/num_shards_ * (shard_id_ + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just:

max_index_ = num_indices * (shard_id_ + 1) / num_shards_ 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted

Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
@pribalta pribalta force-pushed the pribalta_fix_sharding_indexed_reader branch from 6875724 to 0e27633 Compare September 13, 2018 09:42
@pribalta
Copy link
Contributor Author

Waiting till another run finishes and reports good results and I'll be merging this guy

@ptrendx
Copy link
Member

ptrendx commented Sep 13, 2018

I agree that it is a bug (good job finding it :-) ) but I do not agree fully with the fix, because it is not consistent with the rest of readers. You should instead change the reset function to set the current index to 0

Another way to fix this issue is to make the different threads wrap around the number of shards and and only initializing them to their corresponding offset.

This solution is consistent with other readers.

Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
@pribalta pribalta merged commit 9579eba into master Sep 14, 2018
@pribalta pribalta deleted the pribalta_fix_sharding_indexed_reader branch September 14, 2018 17:59
JanuszL pushed a commit that referenced this pull request Sep 17, 2018
* Fix for low accuracy in Tensorflow training

Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training.
Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we
proceed to reset current_index_ to start again at the beginning of the shard.

This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of
current_index to the size of all the indexes. The fix consists in enabling the shard to wrap around the indices by making reset set current_index_ to zero.

Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would
result in a loss in accuracy due to shards unevenly loading the dataset.

Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
pribalta added a commit to pribalta/DALI that referenced this pull request Oct 1, 2018
* Fix for low accuracy in Tensorflow training

Indexed_file_loader.h performs a sharded read of the index files supplied alongside the tfrecords for training.
Thus, ReadSample() is called from multiple threads with different shard_id_. When all indexes have been read, we
proceed to reset current_index_ to start again at the beginning of the shard.

This commit fixes a defect in the detection that the shard has arrived to the end, by comparing the value of
current_index to the size of all the indexes. The fix consists in enabling the shard to wrap around the indices by making reset set current_index_ to zero.

Before the fix, the code would work correctly for 1 GPU when there is only 1 shard, but with >1 GPUs, it would
result in a loss in accuracy due to shards unevenly loading the dataset.

Signed-off-by: Pablo Ribalta Lorenzo <pribalta@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants