-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: unique file names and squashed neptune stdout #57
fix: unique file names and squashed neptune stdout #57
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @JemmaLDaniel! Just a minor style suggestion. I think we can use enumerate
and manually control the tqdm
progress bar. This way we do not have to set up and manually increment a counter.
I also think if we just initialise the download file with the counter variable we do not have to rename things, because every file will have a unique name when it is downloaded.
marl_eval/json_tools/json_utils.py
Outdated
# and doesn't need to be unzipped. | ||
# Rename the file by appending run_counter to its existing name | ||
renamed_file_path = f"{file_path}_{counter}" | ||
os.rename(file_path, renamed_file_path) | ||
except Exception as e: | ||
print(f"An error occurred while unzipping or storing {file_path}: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another minor detail here. I think it will be nice if we give the Neptune run ID here as well. This way someone could quickly know exactly which Neptune run is causing an issue and manually inspect JSON files if they wanted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doing this previously, but I opted for the run_counter
in case run_id
wasn't unique enough when there is more than one data_key
per run_id
. I can easily append it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that would still be nice to show is that the error happens when trying to pull data for run MAV-30546
for example. Then someone could easily inspect what these is is if they wanted to because file_parh
and the error e
will not make it easy to see which Neptune run is causing the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_id
is appended to file_path
instead of using i
in the enumerator. Do you want it to be more explicit than that in the print message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right. But since file_path
will look something like <some_path>/<some_key>_MAV-30546_<j>
and the most useful info for finding issues in Neptune runs will be just the run_id
something that gives run id info and file path info could be:
print(f"An error occurred while unzipping or storing {file_path}: {e}") | |
print( | |
"The following error occurred while unzipping or storing JSON data for run " | |
f"{run_id} at path {file_path}: {e}" | |
) |
This is a super nitpicky thing though, so we can also ignore it 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that is more readable! No need to lower the marl-eval standards 😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @JemmaLDaniel 🔥
What?
tqdm
progress bar updates, which has been suppressed so the progress bar displays as expected.data_key
is not unique. This causes two JSON files to be stored in one folder with the samedata_key
name and messes up plotting downstream. This has been fixed so that each JSON file is stored in a unique folder.Why?
marl-eval
's concatenating and plotting tools require.How?