-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refresh F10.7 and ap data handling with partial data support #74
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you @mananapr!
I've got a few comments/suggestions, but happy to get this in!
pymsis/utils.py
Outdated
@@ -207,7 +280,7 @@ def get_f107_ap(dates: npt.ArrayLike) -> tuple[npt.NDArray, npt.NDArray, npt.NDA | |||
| prior to current time | |||
""" | |||
dates = np.asarray(dates, dtype=np.datetime64) | |||
data = _DATA or _load_f107_ap_data() | |||
data = _DATA or _load_f107_ap_data(dates[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a long running process with _DATA
already loaded, then this won't update the data for us. I think the logic should probably be a new if block here instead after the original loading potentially?
data = _DATA or _load_f107_ap_data(dates[-1]) | |
# Load/download data right away. | |
data = _DATA or _load_f107_ap_data() | |
if dates[-1] > data["dates"][-1]: | |
data = _new_function_to_download_recent_data() |
pymsis/utils.py
Outdated
fout.write(line.encode("utf-8")) | ||
fout.seek(0) | ||
arr = np.loadtxt( | ||
fout, delimiter=",", dtype=dtype, usecols=usecols, skiprows=1 | ||
) # type: ignore | ||
|
||
# Check if the file needs to be refreshed after parsing | ||
if last_obs_date is not None: | ||
file_mod_time = datetime.fromtimestamp(os.path.getmtime(_F107_AP_PATH)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need file modification time? I think we should just leave that out and only rely on the requested date being greater than the observations we have.
pymsis/utils.py
Outdated
if ",OBS," in line: | ||
# Capture last observed date | ||
last_obs_date = np.datetime64(line.split(",")[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get this from the _DATA
numpy arrays? Maybe warn_data
contains this information already and we could do something like (have not tried this) _DATA["dates"][~_DATA["warn_data"]][-1]
.
pymsis/utils.py
Outdated
writer.writeheader() | ||
writer.writerows(sorted_data) | ||
|
||
os.remove(_PARTIAL_F107_AP_PATH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be wary of this because if there is any download error above, this file will be left hanging around.
A few quick thoughts:
- Do we need to save it to disk, or could you store it in a BytesIO object in memory instead?
- Would it make sense to save
_DATA
arrays withnp.savez()
instead of the csv files for easier manipulation? (Then you wouldn't need to mess with the dictreader/writer. - Honestly, now that I'm thinking about it more, is this just more headache than it is worth and we should just use your logic below and re-download the full file for the simplest case. More bandwidth, but at least we are getting the full file built for us rather than building it from bits and pieces.
Hi @greglucas, I've pushed another commit based on your suggestions.
Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mananapr, this looks great to me. Just one minor suggestion.
pymsis/utils.py
Outdated
@@ -208,6 +207,12 @@ def get_f107_ap(dates: npt.ArrayLike) -> tuple[npt.NDArray, npt.NDArray, npt.NDA | |||
""" | |||
dates = np.asarray(dates, dtype=np.datetime64) | |||
data = _DATA or _load_f107_ap_data() | |||
if dates[-1] > data["dates"][~np.repeat(data["warn_data"], 8)][-1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone didn't sort their inputs/request, we should use a max here I think. Could you also add a quick comment about what this if statement is doing.
if dates[-1] > data["dates"][~np.repeat(data["warn_data"], 8)][-1]: | |
# If our requested data time is after the cached values we have, go and download a new file | |
# to refresh the local file cache | |
if dates.max() > data["dates"][~np.repeat(data["warn_data"], 8)][-1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @greglucas, I've commented the date comparison check
Add tests to make sure the downloads are triggered appropriately. This required updating the location to download to a separate temporary directory for each test. Before we were overwriting the actual test file and that caused issues in subsequent tests when going to load from that. Change the condition to only refresh if the data being requested is before the current time AND after the last date in the file.
@mananapr I pushed a commit up to your branch to try and fix up some of the failing tests. Unfortunately, it was more involved than I hoped, so there is a bit more work there than anticipated. It also apparently didn't fix the issue :-/ But it does work locally for me. Feel free to drop that commit if you don't like it and continue on in a different way to try and get the tests to pass too. |
Ah, I totally missed that. Will look into the changes, thanks! |
Refresh F10.7 and ap data handling with partial data support and automatic updates
Closes #69