Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add flushing of val epoch resluts #256

Merged
merged 3 commits into from
Oct 25, 2024
Merged

add flushing of val epoch resluts #256

merged 3 commits into from
Oct 25, 2024

Conversation

AUdaltsova
Copy link
Contributor

Pull Request

Description

array that stores inference results for wandb csv saving was not getting cleaned between val epochs, resulting in blowing up file sizes. Added removing results after logging.

Potentially fixes #255 as the files will be kept to 19-20 MB (for me at least; obviously will be different for different configs, this is 3 quantiles and 32 points predictions) instead of getting to 240+MB by epoch 11 etc. (Still need to remove keeping multiple versions of one artifact, not sure how to do that yet)

How Has This Been Tested?

Run training for 3 epochs, checked resulting csvs had expected sizes and contents

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@AUdaltsova
Copy link
Contributor Author

Ok to be fair turns out versioning happens because different runs in one project save their artifacts under the same name, so one run has one file per epoch instead of multiple as I originally thought. The naming might become a problem though bc afaik the main usecase for these files is to be downloaded for further analysis? If so getting the correct one can get kinda tricky, maybe we want to prefix them by run or something. Or maybe people who can use wandb.api better than me can actually do it neatly and this is a non-isssue?

@peterdudfield
Copy link
Contributor

Ok to be fair turns out versioning happens because different runs in one project save their artifacts under the same name, so one run has one file per epoch instead of multiple as I originally thought. The naming might become a problem though bc afaik the main usecase for these files is to be downloaded for further analysis? If so getting the correct one can get kinda tricky, maybe we want to prefix them by run or something. Or maybe people who can use wandb.api better than me can actually do it neatly and this is a non-isssue?

Weird, I would have thought each articfact is seperate for experiment, but maybe im wrong there. is there a model id we can use to suffix it?

@AUdaltsova
Copy link
Contributor Author

Weird, I would have thought each articfact is seperate for experiment, but maybe im wrong there. is there a model id we can use to suffix it?

so fun fact, horizon loss curve tables do get prefixed by run-{id}, so should be easy to copy it to here as soon as I can find where this happens :)

@AUdaltsova AUdaltsova marked this pull request as ready for review October 24, 2024 15:46
@AUdaltsova AUdaltsova merged commit 5360394 into main Oct 25, 2024
3 checks passed
@AUdaltsova AUdaltsova deleted the wandb-patch branch October 25, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wandb csv artifact saves too much data to local storage
3 participants