Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
wait for non-master workers to finish reading vocab before master wor…
Browse files Browse the repository at this point in the history
…ker saves it (#4274)

* wait for non-master workers to finish reading

* update CHANGELOG

* typo
  • Loading branch information
epwalsh authored May 21, 2020
1 parent f27475a commit dacbb75
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Tons of docstring inconsistencies resolved.
- Nightly builds no longer run on forks.
- Distributed training now automatically figures out which worker should see which instances
- A race condition bug in distributed training caused from saving the vocab to file from the master process while other processing might be reading those files.

### Added

Expand Down
7 changes: 6 additions & 1 deletion allennlp/commands/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,12 @@ def from_partial_objects(

# Initializing the model can have side effect of expanding the vocabulary.
# Save the vocab only in the master. In the degenerate non-distributed
# case, we're trivially the master.
# case, we're trivially the master. But in the distributed case we need to be careful
# to avoid a race condition where we might be writing the vocab from the master
# process while another process is reading it. So we use a barrier here
# to wait for the other processes to finish reading.
if common_util.is_distributed():
dist.barrier()
if common_util.is_master():
vocabulary_path = os.path.join(serialization_dir, "vocabulary")
vocabulary_.save_to_files(vocabulary_path)
Expand Down

0 comments on commit dacbb75

Please sign in to comment.