-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make a default data directory #303
Make a default data directory #303
Conversation
# Eg. imap_module_directory = /usr/local/lib/python3.11/site-packages/imap_processing | ||
imap_module_directory = Path(__file__).parent | ||
|
||
instruments = [ | ||
INSTRUMENTS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed these to a typical GLOBAL caps pattern since they seemed like globals. I can change these back if this is just noise.
pyproject.toml
Outdated
[project.scripts] | ||
imap_processing = "imap_processing.run_processing" | ||
[tool.poetry.scripts] | ||
imap_cli = "imap_processing.run_processing:main" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like poetry doesn't respect project.scripts
so need to change this over to tool.poetry.scripts
. @laspsandoval you changed this in one of your previous PRs that you then declined. Did you have a different better name for this? I couldn't find it immediately, so happy to hear your thoughts on the naming here.
imap_processing
is not great because it clashes with our package/directory name so when you are in the current directory it tries to run that folder instead of the installed script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between, Laura's PR and this PR, you may have merge conflict depending on who merge first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not good with names. But I agree about choosing a good name for the script file. imap_processing
or cli
is not good. one is confusing with package and other is too generic. since we are using imap_cli
, we could change script file name to be that too unless others have good name suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. It will be an easy rebase, so I'm happy to resolve after hers goes in.
Agreed on moving things around further! Maybe do that as a follow-up PR when someone with good naming skills comes in :)
0e8d45e
to
134c61b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. To double check, DATA_DIR is where data is downloaded to when we are using download API, right?
Yes, you could set it to wherever you want, and if you wanted to set it to different places that is fine too. @tech3371 Thoughts on whether this should be the place we write files explicitly to? i.e. this comment: #303 (comment) |
217423e
to
417b846
Compare
I apologize for the churn/expansion here, this is significantly updated since you reviewed @tech3371.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple minor questions and comment. It looks good to me still though.
pyproject.toml
Outdated
[project.scripts] | ||
imap_processing = "imap_processing.run_processing" | ||
[tool.poetry.scripts] | ||
imap_cli = "imap_processing.run_processing:main" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between, Laura's PR and this PR, you may have merge conflict depending on who merge first.
pyproject.toml
Outdated
[project.scripts] | ||
imap_processing = "imap_processing.run_processing" | ||
[tool.poetry.scripts] | ||
imap_cli = "imap_processing.run_processing:main" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not good with names. But I agree about choosing a good name for the script file. imap_processing
or cli
is not good. one is confusing with package and other is too generic. since we are using imap_cli
, we could change script file name to be that too unless others have good name suggestion.
b7dde16
to
bc3116a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I like the changes. This will keep us more consistent/organized. I just have a few non-blocking suggestions.
data: xr.Dataset, description: str = "", mode: str = "", directory: str = "" | ||
data: xr.Dataset, | ||
description: str = "", | ||
directory: Optional[Path] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also update the Parameters
section of the docstring to reflect that directory
is now a Path
object instead of the str
? (Looks like that whole section is not formatted correctly too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I updated all of this to be the Numpy docstring style. I think we can use the type-hints to automagically generate these if we wanted to go that route as well and avoid having to manually document these...
@pytest.fixture(autouse=True) | ||
def _set_global_data_dir(tmp_path): | ||
"""Set the global data directory to a temporary directory.""" | ||
_original_data_dir = imap_processing.config["DATA_DIR"] | ||
imap_processing.config["DATA_DIR"] = tmp_path | ||
yield | ||
imap_processing.config["DATA_DIR"] = _original_data_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a neat way to do this!
def test_write_cdf(tmp_path): | ||
# Set up a fake dataset | ||
# lots of requirements on attributes, so depend on SWE for now | ||
ds = xr.Dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ds = xr.Dataset( | |
dataset = xr.Dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this and fname
to spell it out. But, I think there is an argument to be made for keeping it ds
for xarray dataset, df
for pandas dataframe, np
for numpy etc... Their documentation uses ds
pretty universally, so is it better to stick to that or the spelled out name?
https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am annoyed that they chose to use ds
instead of dataset
, but I am totally fine with keeping it ds
, fname
, etc. if that is basically a 'standard' for the particular library.
This adds the ability to set the data directory for reading and writing data files to an explicit location of the user's choosing through command line or environment variables.
4e87e04
into
IMAP-Science-Operations-Center:dev
…data-dir Make a default data directory
Change Summary
Overview
This adds a global data directory to maintain all files underneath that is user-settable via an environment variable or command line argument. The purpose is to start moving towards the same logical folder structure we are using in AWS, but locally so all our file management is consistent. I ended up touching more than I wanted to here to try and add some tests and bring the current tests into pathlib parity. I'm happy to try and break some of this out if it is too large for a review now.
New Dependencies
n/a
New Files
n/a
Deleted Files
n/a
Updated Files
Many updates to various path handling functions.
Testing
I added a few subprocess calls to verify that instantiating via environment variable works as expected.