Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader2] Saving and restoring initial seed generator #998

Closed
wants to merge 19 commits into from

Conversation

NivekT
Copy link
Contributor

@NivekT NivekT commented Feb 8, 2023

Stack from ghstack:

Changes to DataLoader2:

  • Modifying state_dict to store randomness_state, which includes:
    • _seed: int
    • _reset_seed: bool - flag indicating whether _seed needs to be set
    • _seed_generator - the latest version at the time when state_dict is called
    • _initial_seed_generator - the versopm that is saved at the beginning of very epoch
  • Modifying from_state and load_state_dict to restore randomness_state
  • Adding a method _restore_checkpoint_beginning_of_epoch
    • This sets self._seed_generator = self._initial_seed_generator, allowing users to re-create an epoch from the beginning.

Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore dl2._seed_generator to the previously saved version. From that point on, they can create a new __iter__ and continue from the beginning of the epoch.

  • Note that since _seed and _reset_seed are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
  • Finally, if users change their mind at any point (after restoring) and want to manual set seed. That seed will override any other behavior and the seed will be used.

Differential Revision: D44390519

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 8, 2023
@NivekT NivekT requested a review from ejguan February 8, 2023 23:18
Copy link
Contributor Author

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR description. I will add some test if the API design makes sense

torchdata/dataloader2/dataloader2.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two main comments:

  • SeedGenerator is pickable. No need to add those supporting functions.
  • IIUC, initial state should be the state at the beginning of each epoch but not the beginning of all epochs

@NivekT
Copy link
Contributor Author

NivekT commented Feb 9, 2023

  • SeedGenerator is pickable. No need to add those supporting functions.

Yea, that makes sense. I can take those out.

  • IIUC, initial state should be the state at the beginning of each epoch but not the beginning of all epochs

I'm expecting __iter__ to be called at the beginning of each epoch, such that this line will be called every time.
self._initial_seed_generator = clone(self._seed_generator)

Does that look right to you?

@ejguan
Copy link
Contributor

ejguan commented Feb 9, 2023

I am not sure if it's needed to differentiate initial seed and current state of seed generator. After SG seeds the graph, the state of SG is not going to evolve during iteration. All random states should be self-preserved by random DataPipes.
-> It means I don't know if we need to add restore_initial_seed_generator to the argument.

During __iter__, like what we offers for reading_service, whenever seed_generator_state is provided, we can just restore it. Otherwise, we should try to seed the seed_generator either by self._seed or None.

Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
- Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
- Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.


---
### Consideration

I decided to make modification to the existing APIs. Alternatively, we can create a new method.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.

I see 2 main scenarios:
1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
    - I think lots of current users (including some internals) are in this category.
    - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
2. Users actively want to restore DP, RS, and initial state of RNG
    - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`

Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Feb 28, 2023

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.



---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.



---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.



---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
NivekT added a commit that referenced this pull request Mar 17, 2023
ghstack-source-id: 9aa77a0f2de7b83ba406552228880f720e20a3b2
Pull Request resolved: #998
@NivekT
Copy link
Contributor Author

NivekT commented Mar 17, 2023

I looked into whether we can refactor ._reset_seed: from DataLoader2 into SeedGenerator, I concluded that is not possible, mainly because the current DL2 behavior wants to use the seed once and don't re-seed until users call the seed method again. Whereas SeedGenerator would re-seed every time its seed method is called.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with a few comments below

torchdata/dataloader2/dataloader2.py Show resolved Hide resolved
test/dataloader2/test_mprs.py Outdated Show resolved Hide resolved
test/dataloader2/test_mprs.py Outdated Show resolved Hide resolved
test/dataloader2/test_mprs.py Outdated Show resolved Hide resolved
Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.



---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.


[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Mar 24, 2023

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.



---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.

Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519)

[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Mar 24, 2023

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@NivekT
Copy link
Contributor Author

NivekT commented Mar 27, 2023

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.



---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.

Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519)

[ghstack-poisoned]
@NivekT
Copy link
Contributor Author

NivekT commented Mar 27, 2023

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@NivekT merged this pull request in 4ba17dd.

@facebook-github-bot facebook-github-bot deleted the gh/NivekT/105/head branch March 31, 2023 14:19
NivekT added a commit that referenced this pull request Apr 6, 2023
…tate for backward compatibility"


Follow up to #998 for backward compatibility.

Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988)

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Apr 6, 2023
…d compatibility"


Follow up to #998 for backward compatibility.

Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988)

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Apr 6, 2023
…tate for backward compatibility"


Follow up to #998 for backward compatibility.

Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988)

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Apr 6, 2023
…d compatibility"


Follow up to #998 for backward compatibility.

Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Apr 10, 2023
Summary:
Pull Request resolved: #1124

Reland of #998 with added guard while loading randomness state in `DataLoader2` for backward compatibility

Changes to `DataLoader2`:
- Modifying `state_dict` to store `randomness_state`, which includes:
    - `_seed: int`
    - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
    - `_seed_generator` - the latest version at the time when `state_dict` is called
    - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
- Modifying `from_state` and `load_state_dict` to restore `randomness_state`
- Adding a method `_restore_checkpoint_beginning_of_epoch`
    -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.

 ---
### Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
- Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
- Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.

Test Plan:
Imported from OSS

f425956975

Reviewed By: bearzx

Differential Revision: D44748514

Pulled By: NivekT

fbshipit-source-id: 8713592902b1e0680e46e4db4280c84c708dbf55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants