Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control Consistency of PersistentDataset #999

Closed
lukasfolle opened this issue Sep 7, 2020 · 5 comments · Fixed by #4633
Closed

Control Consistency of PersistentDataset #999

lukasfolle opened this issue Sep 7, 2020 · 5 comments · Fixed by #4633
Labels
enhancement New feature or request WG: Transforms For the transforms working group

Comments

@lukasfolle
Copy link

lukasfolle commented Sep 7, 2020

Is your feature request related to a problem? Please describe.
Already included as a warning in the source code, the PersistentDataset does not check if any of the transformations changed, and thus after modifying the transformations, the cache is outdated.

Describe the solution you'd like
Calculate the hash of the very first element of the dataset (after calling _pre_first_random_transform) and store this in the cache folder. In case the stored hash matches with the one calculated during __init__ of PersistentDataset the cache is still valid and no deterministic transformations have changed.
However, if the hashes do not match, either a new cache folder could be created or the old cache can be overwritten.

Describe alternatives you've considered
Alternatively, the deterministic transformations could be hashed, but I am not sure how that would work. Might be more memory efficient, but potentially harder to implement due to TypeError: Object of type *** is not JSON serializable for most transforms.

@Nic-Ma Nic-Ma added the enhancement New feature or request label Sep 7, 2020
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Sep 7, 2020

Hi @lukasfolle ,

Thanks for your feature request, that's very interesting idea.
@wyli , do you know how to hash some transforms and parameters?

Thanks.

@lukasfolle lukasfolle reopened this Sep 7, 2020
@lukasfolle
Copy link
Author

Thank you! I just had a quick look at serializing class objects and their parameters and it seems to be a non-trivial tedious process to get there. For that reason, I proposed to hash the transformed data.
But I am curious to hear from others and @wyli if I missed something there!

@lukasfolle
Copy link
Author

@wyli, @Nic-Ma any updates?

@wyli
Copy link
Contributor

wyli commented Sep 10, 2020

thanks, I think this is a valid UI issue. but hashing the transforms/data are non-trivial. In the future we may have interfaces to initialise the transform chain from a user-provided config file (e.g. json/yaml), we could track the change at the config level. let's keep this ticket open for discussions...

@lukasfolle
Copy link
Author

@wyli That sounds great! Could you give a hint when its planned to be implemented?

@wyli wyli added the WG: Transforms For the transforms working group label May 13, 2021
@wyli wyli mentioned this issue Jan 10, 2022
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request WG: Transforms For the transforms working group
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants