You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CWL is a language to standardize function inputs and outputs, and is used for creating data workflows, particularly in other geospatial applications. It is a planned addition to pygeoapi and, more generally, OGC-Processes. Adding support for xclim to be used through this language would be very helpful for people who don't want to dig through python code and just want a plug-and-play solution to compute indices/bias correct/etc.
Potential Solution
I have a working prototype for individual indicators in CWL at the moment, see the additional context below for the code snippet. It creates a docker container for the command line tool, which means there is a lag in running any command, but this may be acceptable for some users?
I have the beginnings of a prototype for CWL for all commands together, but it is still non-functional. (I don't fully understand the language yet!)
In order to avoid the start-up latency, I see a few options:
we could propose to CWL to add support for attaching to a running container, however this runs against the philosophy they have of reproducibility (a running container could have a non-constant state, generally)
Perhaps two steps: Create a constantly running container in the first step of the workflow, and creating fast-running containers for the individual commands, which pipes the commands to the first image, and then in the final step of the workflow, destroy the initial container?
Add support for other sections of xclim than just indicator calculation: bias correction, spatial analogues, unit standardization, etc... This could be done by augmenting the CLI for xclim.
Defining the inputs this way makes them look more natural/similar to what xclim expect (ie: using cwltool ... --freq MS rather than cwltool ... --TX_MAX.freq MS).
Similarly, using a job file would use names and values that are easier to define:
To me, it sounds odd that there would be any start-up latency from the container if it was prebuilt, and that nothing triggers rebuilding it each time (modified file in context for example).
I have noticed that calling xclim by itself has a noticeable start-up latency, so not sure if the container is actually at cause at all.
Another thing to consider when defining the CWL. xclim takes as input a --output file path.
However, the actual path from the point of view of the CWL/container will be mounted volumes with temporary dirs to do the processing.
Therefore, the path doesn't really matter. Only the file name does. The CWL output should do a glob considering this. Something along the lines of :
Should create the file /tmp/result.nc.
But what CWL will have done is actually mount the created temp dirs, retrieve the output from the runtime dir, and stage out the output to the requested --outdir.
Addressing a Problem?
CWL is a language to standardize function inputs and outputs, and is used for creating data workflows, particularly in other geospatial applications. It is a planned addition to pygeoapi and, more generally, OGC-Processes. Adding support for xclim to be used through this language would be very helpful for people who don't want to dig through python code and just want a plug-and-play solution to compute indices/bias correct/etc.
Potential Solution
I have a working prototype for individual indicators in CWL at the moment, see the additional context below for the code snippet. It creates a docker container for the command line tool, which means there is a lag in running any command, but this may be acceptable for some users?
I have the beginnings of a prototype for CWL for all commands together, but it is still non-functional. (I don't fully understand the language yet!)
In order to avoid the start-up latency, I see a few options:
Add support for other sections of xclim than just indicator calculation: bias correction, spatial analogues, unit standardization, etc... This could be done by augmenting the CLI for xclim.
Additional context
Exacmple Code for the CWL indicator calculations
Code for generating indicators CWL, and beginnings of a master CWL
Creating a docker image for xclim:
Templates for the CWL generator:
cwl_template.yml
:cwl_step.yml
:cwl_master.yml
Commands for docker/podman, running the CWL:
Build the image:
podman build -t localhost/xclim:latest .
Create the CWL files:
podman run -v $(pwd)/cwl/:/app/cwl -v $(pwd)/cwl.py:/app/cwl.py localhost/xclim:latest python /app/cwl.py
:Run Indicator calculations:
cwltool --podman --outdir runs cwl/TX_MAX.yml --input data/daily_surface_cancities_1990-1993.nc --output out.nc --TX_MAX.freq ME
(not working) run indicator calculations thru master CWL:
cwltool --podman --outdir runs cwl/master.yml --input data/daily_surface_cancities_1990-1993.nc --output out.nc --indicator TX_MAX --TX_MAX.freq MS
Related issues: #1949
This idea came up during the CLINT/OGC code sprint in Bonn, this October.
Contribution
Code of Conduct
The text was updated successfully, but these errors were encountered: