Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix GPT-NeoX example copy collision during container startup inspired by Liam #6668

Closed
wants to merge 1 commit into from

Conversation

daranday
Copy link
Contributor

@daranday daranday commented Apr 28, 2023

Stack from ghstack (oldest at bottom):

As Liam investigated, there is a copy collision with the GPT-NeoX demo currently. This change mitigates that. We might still need to convert to a shared_fs approach in the future to take advantage of larger scale data.

…pired by Liam

As Liam investigated, there is a copy collision with the GPT-NeoX demo currently. This change mitigates that. We might still need to convert to a shared_fs approach in the future to take advantage of larger scale data.

[ghstack-poisoned]
@liamcli
Copy link
Contributor

liamcli commented May 2, 2023

LGTM for now since merging to your own fork.

We'll also need to update the docker image for all experiment configs once my PR to update the gpt-neox image is merged to the environments repo.
determined-ai/environments#205

In particular, we might break still when not using shared fs for multinode dtrain if the logic to build index is not updated such that we build index on local rank 0 when not using shared fs:
https://github.com/determined-ai/gpt-neox/blob/determined/megatron/data/gpt2_dataset.py#L139

@daranday
Copy link
Contributor Author

daranday commented May 9, 2023

Hey @liamcli, thanks for the review!

About code. It looks like it's using local index now on the gp2_dataset.py, so we should be good now?

About PR format. This is actually targeted for the main branch. The weird base branch (gh/daranday/3/head) is just the Sapling's way of creating code reviews. When landing using sl land ghstack <PR url>, it knows to rebase into main.

@sandrews-hpe sandrews-hpe deleted the branch gh/daranday/3/base March 7, 2024 19:11
@sandrews-hpe sandrews-hpe deleted the gh/daranday/3/head branch March 7, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants