-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control Consistency of PersistentDataset #999
Comments
Hi @lukasfolle , Thanks for your feature request, that's very interesting idea. Thanks. |
Thank you! I just had a quick look at serializing class objects and their parameters and it seems to be a non-trivial tedious process to get there. For that reason, I proposed to hash the transformed data. |
thanks, I think this is a valid UI issue. but hashing the transforms/data are non-trivial. In the future we may have interfaces to initialise the transform chain from a user-provided config file (e.g. json/yaml), we could track the change at the config level. let's keep this ticket open for discussions... |
@wyli That sounds great! Could you give a hint when its planned to be implemented? |
Is your feature request related to a problem? Please describe.
Already included as a warning in the source code, the
PersistentDataset
does not check if any of the transformations changed, and thus after modifying the transformations, the cache is outdated.Describe the solution you'd like
Calculate the hash of the very first element of the dataset (after calling
_pre_first_random_transform
) and store this in the cache folder. In case the stored hash matches with the one calculated during__init__
ofPersistentDataset
the cache is still valid and no deterministic transformations have changed.However, if the hashes do not match, either a new cache folder could be created or the old cache can be overwritten.
Describe alternatives you've considered
Alternatively, the deterministic transformations could be hashed, but I am not sure how that would work. Might be more memory efficient, but potentially harder to implement due to
TypeError: Object of type *** is not JSON serializable
for most transforms.The text was updated successfully, but these errors were encountered: