Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ROOT_DIR for GCR #416

Merged
merged 22 commits into from
Apr 2, 2020
Merged

Implement ROOT_DIR for GCR #416

merged 22 commits into from
Apr 2, 2020

Conversation

JoanneBogart
Copy link
Contributor

Meant to fix #346

Copy link
Member

@yymao yymao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoanneBogart a few suggestions for you to consider --- I realize that some suggestions might become irrelevant or needs updates as you develop further.

GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Show resolved Hide resolved
@heather999
Copy link
Contributor

@JoanneBogart @yymao I'd like to try this out in a docker container - is now a good time? If I were to do that - can you provide a brief set of instructions to do such an install with the new ROOT_DIR?

@JoanneBogart
Copy link
Contributor Author

JoanneBogart commented Feb 13, 2020

@heather999 I don't think there is anything special you need to do. I tried running the bit of code which determines site from within a shifter image and it worked; everything else follows from that.
When GCRCatalogs is imported, that code runs to determine if site is nersc and, if so, sets the default root dir suitably. The existing config files all have absolute paths, which will continue to work with this code. To see it in action you should do a load of one of the test configs with name ending in _rel.
..or maybe you meant instructions to install in the container? You need the development branch u/jrb/root_dir ..more than that I don't know.

@JoanneBogart
Copy link
Contributor Author

test_reader_modules.py is now failing for the "right" reason: site is not found and my additional test configs require it since they have relative paths.
We might have to redesign this test.

@JoanneBogart
Copy link
Contributor Author

@yymao How do you feel about requiring some string (like ${ROOT_DIR}) when root dir is supposed to be prepended? Earlier I believe you said it wasn't necessary and I thought the same, but now I'm not so sure. The advantage is that it would allow people to use a relative path during development without interfering with access to production catalogs. I can imagine this would be useful for someone developing a composite catalog or developing a new catalog of any sort for an application which also needs access to production catalogs in the usual place.

Copy link
Member

@yymao yymao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some inline suggestions here. I'll add a few high-level comments in a bit.

GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Show resolved Hide resolved
GCRCatalogs/register.py Show resolved Hide resolved
GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Show resolved Hide resolved
@yymao
Copy link
Member

yymao commented Feb 21, 2020

@JoanneBogart thanks for the comments/revision. Here's some general thoughts:

  1. I still don't think register.py should take on the job of raising exception when _ROOT_DIR does not exist. Each reader already makes sure the files exist and raises exception when one try to load the catalog the path is wrong. I am not convinced that there is a need to add one more layer that checks only parts of the path.

  2. I don't quite like the ${ROOT_DIR} idea because we then mix in a new syntax (bash). It also lets users to wonder if they can put ${ROOT_DIR} in the middle of the path or use other environment variables (which they can't).

Let's review a few use cases:

  • Yaml config using absolute path: Should not exist for the long term, but for temporary config files it should work as expected (i.e. load from the absolute path)

  • Yaml config using relative path: _ROOT_DIR will be prepended to paths.

  • Yaml config using relative path but someone wants to use a different root dir (e.g. current dir) for temporary reasons: Use set_root_dir() to customize _ROOT_DIR

  • Using config_overwrite at run time with absolute path: load from the absolute path.

  • Using config_overwrite at run time with relative path: the relative path will be prepended with _ROOT_DIR.

  • Using config_overwrite at run time with relative path but want to actually use relative path: the user can either use set_root_dir to customize _ROOT_DIR, or to simply use os.path.abspath('relative/path')

@JoanneBogart
Copy link
Contributor Author

Concerning your general comments above:

I still don't think register.py should take on the job of raising exception when _ROOT_DIR does not exist. Each reader already makes sure the files exist and raises exception when one try to load the catalog the path is wrong. I am not convinced that there is a need to add one more layer that checks only parts of the path.

My thinking is that _ROOT_DIR is the responsibility of register.py. If it doesn't have a sensible value and there is an attempt to use it (where here "use it" means directly or indirectly invoking resolve_config), that is an error. I prefer looking for that error in one place rather than depending on all maintainers of readers to do the right thing. Of course if someone tries to actually open a file using a bad path they'll get some sort of error message, but it will not pinpoint the source of the problem if that problem really had to do with _ROOT_DIR.

I agree that using a string that looks like a shell environment variable is not a good idea because it's subject to misinterpretation. I need to come up with a better alternative. But I do think the concept is valuable so that a) the paths in config files mean what they appear to mean (relative path is always relative to current directory) and b) users can refer to catalogs which are stable and in production as well as those under development in the same process without having to resort to absolute paths for the development ones. You suggested a way to handle this with config_overwrite in your last bullet. That would work, but as a developer I wouldn't want to have to change my code to accomplish this (and have to remember to change it back when the catalog becomes production). It's much easier to keep track of it in the config file, especially since any relative path in the config would have to change before it could become part of a tag.

@yymao
Copy link
Member

yymao commented Feb 22, 2020

Again, my fundamental request is that register.py should not raise exception regarding the path being invalid. Unless we are changing the fundamental philosophy here, I really want to maintain the idea that we allow the readers to be in charge of doing whatever they want. The job of register.py is to pass the config files to the reader, and it's not it's job to say if the path or part of the path is not valid.

As for supporting relative paths, I feel like we are creating imaginary use cases here. I am not sure we've ever encountered such a need. And if it we do, I think it is rare enough that it is not unreasonable to ask them to specify absolute path during development stage.

If you think it is important that we must support such use case, that's fine by me, and my request would be not to use a syntax that is suggestive that users can do more than that can (which I feel it's really difficult to avoid).

@JoanneBogart
Copy link
Contributor Author

JoanneBogart commented Feb 25, 2020

I think it will be less confusing if something that looks like a relative path has its usual meaning (as it has up till now): relative to current directory.
How about using ^ to represent root dir? This character has no special meaning for YAML but is very unlikely to appear in a filepath. The three cases would then be

  • /an/absolute/path
  • relative/to/current/directory
  • ^/relative/to/rootdir

@yymao
Copy link
Member

yymao commented Feb 25, 2020

That's fine. I feel "relative to current directory" use case would be much more rare than the "relative to rootdir" use case? How about using "./" for the former? i.e.,

  • /an/absolute/path
  • relative/to/rootdir
  • ./relative/to/current/directory

@JoanneBogart
Copy link
Contributor Author

I agree that relative-to-rootdir is going to be much more common than relative-to-current-dir, but I think clarity is more important than saving people typing an extra character. At the command line relative and ./relative give you the same thing so people's expectation would be that they translate to the same thing in the config file as well. The one thing that can't be represented in any standard fashion at the command line is the relative-to-rootdir case, so I think that's the one that needs something special.
I'd like to stick with ^ for relative-to-rootdir if that's ok.

@yymao
Copy link
Member

yymao commented Feb 25, 2020

It's ok. But I think it'll hard to guess what ^/ means by just looking at the config files.

I anticipate this is going to cause some confusion (in fact, any of these proposals will) but I don't have better idea so go ahead.

@JoanneBogart
Copy link
Contributor Author

Yes, it is by no means obvious! The best we can hope for is to not be misleading. We need to document it clearly and extensively. When I change all the existing config files I can add a comment at the top of each explaining. Perhaps we can also add one or more example files either in the catalog_configs directory or somewhere else in the repo.

@JoanneBogart
Copy link
Contributor Author

@heather999 The code in branch u/jrbogart/root_dir is getting close to final. Before it can be used as intended we need a directory to act as the root dir for catalogs which will probably mostly contain sym links to wherever each catalog actually is. Where should this directory be? Once this is determined I can make the sym links and update config files (still on the branch until we're sure it's working).

@yymao yymao added the configs Issues related to catalog config yaml files label Mar 16, 2020
@yymao yymao added the register Issues related to the register module label Mar 16, 2020
@JoanneBogart JoanneBogart marked this pull request as ready for review March 30, 2020 17:27
Copy link
Member

@yymao yymao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @JoanneBogart -- I have made some minor comments/suggestions.

GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Show resolved Hide resolved
GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/register.py Outdated Show resolved Hide resolved
GCRCatalogs/site_config/site_rootdir.yaml Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
tests/test_reader_modules.py Outdated Show resolved Hide resolved
JoanneBogart and others added 2 commits April 2, 2020 13:26
This work-around is no longer necessary

Co-Authored-By: Yao-Yuan Mao <yymao.astro@gmail.com>
yymao
yymao previously approved these changes Apr 2, 2020
Copy link
Member

@yymao yymao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @JoanneBogart. I'm approving now.

Have you tried running pytest on nersc? This will include a full test suite that tries to load each of the catalogs. It takes a few hours to run though.

@JoanneBogart
Copy link
Contributor Author

Thanks for the tip - I've started up pytest. If the run looks ok I will go ahead and merge.

@JoanneBogart JoanneBogart merged commit 4d6f91a into master Apr 2, 2020
@JoanneBogart JoanneBogart deleted the u/jrb/root_dir branch April 2, 2020 23:38
@yymao yymao mentioned this pull request Apr 2, 2020
@yymao yymao changed the title Implement ROOT_DIR for GCR. Needs more testing Implement ROOT_DIR for GCR Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
configs Issues related to catalog config yaml files register Issues related to the register module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make root directory (common prefix of all base_dirs) customizable during setup
4 participants