-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THREDDS: add more options to configure catalog.xml #472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, juste ensure the example is working out of the box.
Oh wait, can you do the same changes for optional-components/testthredds as well? |
Isn't this set up so that we can run tests against a different THREDDS server? If the tests don't require a different configuration why do we need to change this as well? |
To test different version yes.
I meant to allow the same customizations for the test. Currently we are testing Thredds v5 using this testthredds on our production host. By the same token we could at the same time test additional configs. So the same customizations would be useful. So I meant to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The THREDDS definitions need to align with Magpie definitions to protect the contents accordingly.
See
birdhouse-deploy/birdhouse/components/thredds/config/magpie/providers.cfg.template
Lines 11 to 35 in 3d7c8d6
configuration: | |
skip_prefix: "thredds" # prefix to ignore, below prefixes will be matched against whatever comes after in path | |
file_patterns: | |
# note: make sure to employ quotes and double escapes to avoid parsing YAML error | |
- ".+\\.ncml" # match longest extension first to avoid tuncating it by match of sorter '.nc' | |
- ".+\\.nc" | |
metadata_type: | |
prefixes: | |
- null # note: special YAML value evaluated as `no-prefix`, use quotes if literal value is needed | |
- "\\w+\\.gif" # threddsIcon, folder icon, etc. | |
- "\\w+\\.ico" # favicon | |
- "\\w+\\.txt" # licence | |
- "\\w+\\.css" # tds.css | |
- "catalog\\.\\w+" # note: special case for `THREDDS` top-level directory (root) accessed for `BROWSE` | |
- catalog | |
- ncml | |
- uddc | |
- iso | |
data_type: | |
prefixes: | |
- fileServer | |
- dodsC | |
- wcs | |
- wms | |
- ncss |
and
https://pavics-magpie.readthedocs.io/en/latest/services.html#servicethredds
for details.
variables: | ||
- `THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS`: this allows users to specify additional [filter | ||
elements](https://docs.unidata.ucar.edu/tds/current/userguide/tds_dataset_scan_ref.html#including-only-desired-files) to the Service Data dataset. This is especially useful if a WPS | ||
outputs files with an extension other than the default (eg: .h5) to the `wps_outputs/` directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the use case - not against it, just making sure the intended use is appropriate.
Is there any advantage of exposing those HDF5 files via THREDDS rather than accessing them directly by the WPS-outputs dir? If anything, I would expect Nginx to provide much better/faster responses, as well potentially additional support of Content-Range
requests if enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I have no intuition about what is better/faster in this case
What was the intention for how the Magpie permissions were supposed to interact with additional catalogs introduced by setting the Is there a solution in place for that already or do I have to come up with a solution that accounts for arbitrary catalog definitions as well? |
For Magpie permissions, it does not really care about how the catalogs (the The |
Thanks for the detailed explanation. I understand the Magpie configuration and how its "prefix" definitions relate to the URLs in THREDDS. My concern is more about whether we need to be able to customize the "file_patterns" definitions in the Magpie configuration files to handle duplicate file extensions other than .nc and .ncml The other concern is that users can define custom service definitions if they'd like other than the ones listed here:
Or they could potentially modify the I propose we either do:
I'm working on a solution but if you have any insight into this issue let me know |
If custom service types are added, they must be provided in the browse/read section accordingly for Magpie to grant/deny access to them as expected. Similarly, additional file patterns (or extensions) must also be provided. If another location than Rather than having |
@@ -74,5 +74,4 @@ export DELAYED_EVAL=" | |||
THREDDS_SERVICE_DATA_LOCATION_ON_HOST | |||
THREDDS_IMAGE | |||
THREDDS_IMAGE_URI | |||
THREDDS_DATASET_DATASETSCAN_BODY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: This cannot be a delayed eval variable because of the way that the delayed evaluation process removes quotes. If we change the delayed eval process to allow this, it will break other functionality (especially in jupyterhub).
After several iterations, I don't think that there is an easy way to get the flexibility we want by defining these variables and also enforce the Magpie settings as well. So the compromise I went with is to add some defaults and make the Magpie settings configurable as well so that they can be updated as needed. I also added some instructions/warnings about how to configure Magpie to match changes to THREDDS. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2845/Result ✅ SUCCESS BIRDHOUSE_DEPLOY_BRANCH : thredds-more-configuration DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1716/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the configuration is overcomplicated, and that is reflected in the example comments trying to explain what everything does in this process that is already complicated in itself.
For example, if I wanted to configure THREDDS with only fileServer
for .nc
and .txt
, and no other extension, there are way too many variables to override.
export THREDDS_MAGPIE_EXTRA_METADATA_PREFIXES='".+\\.txt"'
export THREDDS_MAGPIE_EXTRA_DATA_PREFIXES='".+\\.nc"'
export THREDDS_DEFAULT_FILE_FILTERS=''
export THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS='
<include wildcard="*.txt" />
<include wildcard="*.nc" />
'
And still, "somehow" override the Magpie providers file, since all "default" metadata_type
and data_type
I do not want would still be enabled otherwise.
Is there really any advantage of having duplicate sets of THREDDS_DEFAULT_...
and THREDDS_..._EXTRA_...
variables?
Can't everything be simplified by having a single one, which uses the THREDDS_DEFAULT_...
values by default?
# If you need this dataset to serve other files you should update the THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS to add | ||
# additional file filters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention the corresponding THREDDS_MAGPIE_...
variables as well?
I think I agree that this has gotten out of hand. The main issue is that I don't want to make it possible to break the service catalog which would break other things internally for the rest of the components in the stack. But I still don't fully understand how that is used...
I think that your set up here is overly complicated actually (which highlights your point). I don't think you'd need to set
I think I agree with this. But the defaults vs. the extras were requested in the discussion here #472 (comment) Do you no longer think that's a concern? |
I think it might be worthwhile to remove the defaults vs extras duplication to make things easier for users in general. If one wants to preserve the defaults, it is easy to copy-paste its value and add the "extra" that is desired within a single variable. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2847/Result ❌ FAILURE BIRDHOUSE_DEPLOY_BRANCH : thredds-more-configuration DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca
|
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2848/Result ✅ SUCCESS BIRDHOUSE_DEPLOY_BRANCH : thredds-more-configuration DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1719/NOTEBOOK TEST RESULTS |
Overview
Currently the default THREDDS configuration creates two default datasets, the Service Data dataset and the
Main dataset. The Service Data dataset is used internally and hosts WPS outputs. The Main dataset is the
place where users can access data served by THREDDS. Both of these are configured to serve files with the following
extensions: .nc .ncml .txt .md .rst .csv
In order to allow the THREDDS server to serve files with additional extensions, this introduces two new
variables:
THREDDS_SERVICE_DATA_EXTRA_FILE_FILTERS
: this allows users to specify additional filterelements to the Service Data dataset. This is especially useful if a WPS
outputs files with an extension other than the default (eg: .h5) to the
wps_outputs/
directory.THREDDS_DATASET_DATASETSCAN_BODY
: this allows users to specify the whole body of the main dataset's<datasetScan>
element.This allows users to fully customize how this dataset serves files.
We limit the configuration options for the Service Data dataset more than the main dataset because the Service
Data dataset requires a basic configuration in order to properly serve WPS outputs. Making significant changes
to this configuration could have unexpected negative impacts on WPS usage.
The defaults for these new variables are fully backwards compatible. Without changing these variables, the THREDDS
server should behave exactly the same as before.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false