Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CWL support for xclim #1955

Open
2 tasks done
SarahG-579462 opened this issue Oct 15, 2024 · 0 comments
Open
2 tasks done

Add CWL support for xclim #1955

SarahG-579462 opened this issue Oct 15, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@SarahG-579462
Copy link
Contributor

SarahG-579462 commented Oct 15, 2024

Addressing a Problem?

CWL is a language to standardize function inputs and outputs, and is used for creating data workflows, particularly in other geospatial applications. It is a planned addition to pygeoapi and, more generally, OGC-Processes. Adding support for xclim to be used through this language would be very helpful for people who don't want to dig through python code and just want a plug-and-play solution to compute indices/bias correct/etc.

Potential Solution

  • I have a working prototype for individual indicators in CWL at the moment, see the additional context below for the code snippet. It creates a docker container for the command line tool, which means there is a lag in running any command, but this may be acceptable for some users?

  • I have the beginnings of a prototype for CWL for all commands together, but it is still non-functional. (I don't fully understand the language yet!)

  • In order to avoid the start-up latency, I see a few options:

    • we could propose to CWL to add support for attaching to a running container, however this runs against the philosophy they have of reproducibility (a running container could have a non-constant state, generally)
    • Perhaps two steps: Create a constantly running container in the first step of the workflow, and creating fast-running containers for the individual commands, which pipes the commands to the first image, and then in the final step of the workflow, destroy the initial container?
  • Add support for other sections of xclim than just indicator calculation: bias correction, spatial analogues, unit standardization, etc... This could be done by augmenting the CLI for xclim.

Additional context

Exacmple Code for the CWL indicator calculations

cwlVersion: v1.2
class: CommandLineTool
id: xclim_tx_max
label: Maximum temperature
doc: |
  Maximum of daily maximum temperature.
requirements:
  EnvVarRequirement:
    envDef:
      PYTHONPATH: /app
  ResourceRequirement:
    coresMax: 1
    ramMax: 512
hints:
  DockerRequirement:
    dockerPull: localhost/xclim:latest

baseCommand: ["xclim"]
arguments: []
inputs:
  input:
    type: File
    inputBinding:
      position: 0
      prefix: --input
  output:
    type: string
    inputBinding:
      position: 1
      prefix: --output

  TX_MAX:
    type: 
      type: record
      fields:
        
        - name: tasmax
          doc: |
            Maximum daily temperature.
            Default : tasmax.
          type: string?
          inputBinding:
            prefix: --tasmax 
        

        - name: freq
          doc: |
            Resampling frequency.
            Default : YS.
          type: string?
          inputBinding:
            prefix: --freq 
        
    name: tx_max
    inputBinding:
      position: 2
      prefix: tx_max



outputs:
  outdir:
    outputBinding:
      glob: "*.nc"
    type: File[]

Code for generating indicators CWL, and beginnings of a master CWL

# Generate CWL files from xclim Indicators
import yaml
from pathlib import Path
from xclim.core.utils import InputKind
from loguru import logger
template = Path("cwl_template.yaml")
template_str = template.read_text()

master_template = Path("cwl_master.yaml")
master_str = master_template.read_text()

step_template = Path("cwl_step.yaml")
step_str = step_template.read_text()

fields_template_str = """
- name: {param}
  doc: |
    {doc}
  type: string{optional_flag}
  inputBinding:
    prefix: --{param} 
"""
fields_template_enum = """
- name: {param}
  doc: |
    {doc}
  type:
    {optional_flag}
    - type: enum
      symbols:
        {symbols}
  inputBinding:
    prefix: "--{param}"
"""
input_template = """
  {indicator_id}:
    type: 
      type: record
      fields:
        {fields}
    name: {indicator}
    inputBinding:
      position: 2
      prefix: {indicator}
"""
docker_path = "/app"
docker_image = "localhost/xclim:latest"

import xclim as xc
param_str = "{indicator_id}.{param}: {indicator_id}.{param}"
# indicators = xc.core.indicator.registry
indicators = {'TX_MAX':xc.core.indicator.registry['TX_MAX']}

steps = []
param_fields = []
for name, ind in indicators.items():
    ind_instance = ind.get_instance()
    logger.info("Processing Indicator: " + ind_instance.identifier)
    field_arr = []
    param_list = []
    for param_name, param in ind_instance.parameters.items():
        if param_name in ["ds"] or param.kind == InputKind.KWARGS:
            continue
        param_list.append(param_str.format(param=param_name, indicator_id=name))

        optional_flag = ""
        doc = [param.description.replace("\n", "\n    ")]
        if param.default:
            doc.append(f"Default : {param.default}.")
        

        if "choices" in param:
            choices = f"\n    Choices: {param.choices}"
            doc.append(choices)

            doc = "\n    ".join(doc)
            if param.default:
                optional_flag = '- type: "null"' 
            field = fields_template_enum.format(
                param=param_name,
                symbols="\n        ".join([f'- "{c}"' for c in param.choices]),
                optional_flag = optional_flag,
                doc = doc,
            )
        else:
            if param.default:
                optional_flag = '?' 
            
            doc = "\n    ".join(doc)
            field = fields_template_str.format(
                param=param_name,
                optional_flag=optional_flag,
                doc=doc,
            )
        field_arr.append(field)
    fields = "\n".join([field.replace("\n", "\n        ") for field in field_arr])
    #param_fields.append("\n".join([field.replace("\n", "\n        ") for field in field_arr]))
    inputs = input_template.format(
        indicator_id=name,
        indicator=ind_instance.identifier,
        fields=fields
    )
    param_fields.append(inputs)
    cwl = template_str.format(
        indicator_id=name,
        indicator=ind_instance.identifier,
        indicator_label=ind_instance.title,
        indicator_doc=ind_instance.abstract.replace("\n", "\n  "),
        docker_path=docker_path,
        docker_image=docker_image,
        indicator_inputs=inputs,
    )
    filename = Path(f"cwl/{name}.yml")
    with open(filename, "w") as f:
        f.write(cwl)

    # for each indicator, also generate a step and add to the master CWL.abs
    param_list = '\n    '.join(param_list)
    
    step = step_str.format(
        indicator_id=name,
        indicator=name,
        file = filename.name,
        params=param_list
    )

    steps.append(step.replace("\n", "\n    "))
    break
master_cwl = master_str.format(
    steps="\n    ".join(steps),
    params="\n    ".join([p.replace("\n", "\n  ") for p in param_fields]),
)
logger.info("Writing master CWL")

with open("cwl/master.yml", "w") as f:
    f.write(master_cwl)

Creating a docker image for xclim:


FROM python:3.10-slim

WORKDIR /app

RUN pip install xclim loguru h5netcdf --no-cache-dir

USER root

COPY cwl.py .
COPY *.yaml .


RUN mkdir /app/cwl

#RUN python -m compileall `python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"`

USER $USER

Templates for the CWL generator:

cwl_template.yml:

cwlVersion: v1.2
class: CommandLineTool
id: xclim_{indicator}
label: {indicator_label}
doc: |
  {indicator_doc}
requirements:
  EnvVarRequirement:
    envDef:
      PYTHONPATH: {docker_path}
  ResourceRequirement:
    coresMax: 1
    ramMax: 512
hints:
  DockerRequirement:
    dockerPull: {docker_image}

baseCommand: ["xclim"]
arguments: []
inputs:
  input:
    type: File
    inputBinding:
      position: 0
      prefix: --input
  output:
    type: string
    inputBinding:
      position: 1
      prefix: --output
{indicator_inputs}


outputs:
  outdir:
    outputBinding:
      glob: "*.nc"
    type: File[]

cwl_step.yml:

{indicator}:
  run: {file}
  when: $( (inputs.indicator == {indicator} )
  in:
    input: input
    output: output    
    {params}
  out:
    outdir: outdir

cwl_master.yml

cwlVersion: v1.2
$graph:

- class: Workflow
  requirements:
    - MultipleInputFeatureRequirement
    - SubworkflowFeatureRequirement
    - InlineJavascriptRequirement
    - DockerRequirement
  inputs:
    input: 
      type: string
    output: 
      type: string
    indicator: 
      type: string
    {params}
  steps:
    {steps}

  outputs:
    outdir:
      type: File
      outputSource: 
        valueFrom: ${{ inputs.indicator + '/outdir' }}

Commands for docker/podman, running the CWL:

Build the image:
podman build -t localhost/xclim:latest .

Create the CWL files:
podman run -v $(pwd)/cwl/:/app/cwl -v $(pwd)/cwl.py:/app/cwl.py localhost/xclim:latest python /app/cwl.py:

Run Indicator calculations:
cwltool --podman --outdir runs cwl/TX_MAX.yml --input data/daily_surface_cancities_1990-1993.nc --output out.nc --TX_MAX.freq ME

(not working) run indicator calculations thru master CWL:
cwltool --podman --outdir runs cwl/master.yml --input data/daily_surface_cancities_1990-1993.nc --output out.nc --indicator TX_MAX --TX_MAX.freq MS

Related issues: #1949

This idea came up during the CLINT/OGC code sprint in Bonn, this October.

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@SarahG-579462 SarahG-579462 added the enhancement New feature or request label Oct 15, 2024
@SarahG-579462 SarahG-579462 mentioned this issue Oct 15, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant