Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py1br algorithm implementation #373

Merged
merged 8 commits into from
Aug 29, 2023
Merged

Conversation

snarayan21
Copy link
Collaborator

Description of changes:

Added py1br algorithm, which randomizes and staggers shuffle blocks in order to have more balanced shard downloads over the course of training.

Shuffle blocks are uniformly selected from the range [0.75*shuffle_block_size, 1.25*shuffle_block_size) and staggered between [0, 0.75*shuffle_block_size) samples.

Issue #, if available:

https://mosaicml.atlassian.net/browse/STR-105

Merge Checklist:

Put an x without space in the boxes that apply. If you are unsure about any checklist, please don't hesitate to ask. We are here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the contributor guidelines
  • This is a documentation change or typo fix. If so, skip the rest of this checklist.
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the MosaicML team.
  • I have updated any necessary documentation, including README and API docs (if appropriate).

Tests

  • I ran pre-commit on my change. (check out the pre-commit section of prerequisites)
  • I have added tests that prove my fix is effective or that my feature works (if appropriate).
  • I ran the tests locally to make sure it pass. (check out testing)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes.

Copy link
Collaborator

@karan6181 karan6181 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you also update the shuffle_block_size comment to be more descriptive in here ?
  • Does py1br always outperforms py1b ? If yes, then should we add a deprecation warning on py1b ?

docs/source/fundamentals/shuffling.md Outdated Show resolved Hide resolved
streaming/base/shuffle/__init__.py Show resolved Hide resolved
@snarayan21
Copy link
Collaborator Author

Here's an experimental comparison showing how py1br (in blue) has more balanced downloads over time than py1b (in purple):
8CN_py1br_vs_py1b

And here's an experiment showing how py1br with an expanded shuffle block range of (0.5SBS -> 1.5SBS) (in green) does not result in more balanced downloads than py1br with the current range (0.75SBS -> 1.25SBS) (in blue):
8CN_py1br_increase_range_vs_py1b

The effect of py1br is not noticeable with a low number of physical/canonical nodes, since the effect of balancing downloads by shifting/randomizing shuffle blocks in nodes is not strong with a low number of shuffle blocks. Here's the effect of using py1b (blue) vs py1br (orange) for 2 canonical nodes and 2 physical nodes:
2CN_py1br_vs_py1b

Copy link
Collaborator

@karan6181 karan6181 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Can you please remove .DS_Store file ?

@snarayan21 snarayan21 enabled auto-merge (squash) August 29, 2023 16:47
@snarayan21 snarayan21 merged commit e58984e into mosaicml:main Aug 29, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants