Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Failed to format pattern '${xxx}': no config value found, no default provided #129

Open
stephanecollot opened this issue Feb 17, 2022 · 6 comments

Comments

@stephanecollot
Copy link

stephanecollot commented Feb 17, 2022

Hello

With:
kedro 0.17.4
kedro-airflow-k8s 0.7.3
python 3.8.12

I have a templated catalog:

training_data:
  type: spark.SparkDataSet
  filepath: data/${folders.intermediate}/training_data
  file_format: parquet
  save_args:
    mode: 'overwrite'
  layer: intermediate

with the parameter set in my globals.yml

folders:
    intermediate: 02_intermediate

And when I run:
kedro airflow-k8s compile

I get the following error

Traceback (most recent call last):
  File "/Users/user/miniconda3/envs/kedro/bin/kedro", line 8, in <module>
    sys.exit(main())
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 265, in main
    cli_collection()
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 210, in main
    super().main(
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/cli.py", line 64, in compile
    ) = get_dag_filename_and_template_stream(
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/template.py", line 170, in get_dag_filename_and_template_stream
    template_stream = _create_template_stream(
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/template.py", line 92, in _create_template_stream
    pipeline_grouped=context_helper.pipeline_grouped,
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/context_helper.py", line 46, in pipeline_grouped
    return TaskGroupFactory().create(self.pipeline, self.context.catalog)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/context/context.py", line 329, in catalog
    return self._get_catalog()
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/context/context.py", line 365, in _get_catalog
    conf_catalog = self.config_loader.get("catalog*", "catalog*/**", "**/catalog*")
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 191, in get
    return _format_object(config_raw, self._arg_dict)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 264, in _format_object
    new_dict[key] = _format_object(value, format_dict)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 264, in _format_object
    new_dict[key] = _format_object(value, format_dict)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 279, in _format_object
    return IDENTIFIER_PATTERN.sub(lambda m: str(_format_string(m)), val)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 279, in <lambda>
    return IDENTIFIER_PATTERN.sub(lambda m: str(_format_string(m)), val)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 242, in _format_string
    raise ValueError(
ValueError: Failed to format pattern '${folders.intermediate}': no config value found, no default provided

With this conf/base/airflow-k8s.yaml

host: https://airflow.url


output: dags

run_config:

  image: spark_image

  image_pull_policy: Always

  startup_timeout: 600

  namespace: namespace

  experiment_name: experiment

  run_name: experiment

  cron_expression: "@daily"

  description: "experiment Pipeline"


  service_account_name: namespace-vault

  volume:
      disabled: True

  macro_params: [ds, prev_ds]

  variables_params: []

I add the fact that kedro run works.

Do you have any hint?

@stephanecollot
Copy link
Author

Sorry actually kedro run doesn't work.
So it is not coming from kedro-airflow-k8s

@stephanecollot
Copy link
Author

Actually when I uninstall kedro-airflow-k8s then kedro run works again

@stephanecollot
Copy link
Author

stephanecollot commented Feb 17, 2022

It seems that now with kedro-airflow-k8s-0.6.7
and with this conf/base/airflow-k8s.yaml


# Base url of the Apache Airflow, should include the schema (http/https)
host: https://airflow.url

# Directory from where Apache Airflow is reading DAGs definitions
output: dags

# Configuration used to run the pipeline
run_config:

    # Name of the image to run as the pipeline steps
    image: experiment

    # Pull policy to be used for the steps. Use Always if you push the images
    # on the same tag, or Never if you use only local images
    image_pull_policy: IfNotPresent

    # Pod startup timeout in seconds
    startup_timeout: 600

    # Namespace for Airflow pods to be created
    namespace: airflow

    # Name of the Airflow experiment to be created
    experiment_name: experiment

    # Name of the dag as it's presented in Airflow
    run_name: experiment

    # Apache Airflow cron expression for scheduled runs
    cron_expression: "@daily"

    # Optional start date in format YYYYMMDD
    #start_date: "20210721"

    # Optional pipeline description
    #description: "Very Important Pipeline"

    # Comma separated list of image pull secret names
    #image_pull_secrets: my-registry-credentials

    # Service account name to execute nodes with
    #service_account_name: default

    # Optional volume specification
    volume:
        # Storage class - use null (or no value) to use the default storage
        # class deployed on the Kubernetes cluster
        storageclass: # default
        # The size of the volume that is created. Applicable for some storage
        # classes
        size: 1Gi
        # Access mode of the volume used to exchange data. ReadWriteMany is
        # preferred, but it is not supported on some environements (like GKE)
        # Default value: ReadWriteOnce
        #access_modes: [ReadWriteMany]
        # Flag indicating if the data-volume-init step (copying raw data to the
        # fresh volume) should be skipped
        skip_init: False
        # Allows to specify fsGroup executing pipelines within containers
        # Default: root user group (to avoid issues with volumes in GKE)
        owner: 0
        # Tells if volume should not be used at all, false by default
        disabled: False

    # List of optional secrets specification
    secrets:
            # deploy_type: The type of secret deploy in Kubernetes, either `env` or
            # `volume`
        -   deploy_type: "env"
            # deploy_target: (Optional) The environment variable when `deploy_type` `env`
            # or file path when `deploy_type` `volume` where expose secret. If `key` is
            # not provided deploy target should be None.
            deploy_target: "SQL_CONN"
            # secret: Name of the secrets object in Kubernetes
            secret: "airflow-secrets"
            # key: (Optional) Key of the secret within the Kubernetes Secret if not
            # provided in `deploy_type` `env` it will mount all secrets in object
            key: "sql_alchemy_conn"

    # Apache Airflow macros to be exposed for the parameters
    # List of macros can be found here:
    # https://airflow.apache.org/docs/apache-airflow/stable/macros-ref.html
    macro_params: [ds, prev_ds]

    # Apache Airflow variables to be exposed for the parameters
    variables_params: [env]

    # Optional resources specification
    #resources:
        # Default configuration used by all nodes that do not declare the
        # resource configuration. It's optional. If node does not declare the resource
        # configuration, __default__ is assigned by default, otherwise cluster defaults
        # will be used.
        #__default__:
            # Optional labels to be put into pod node selector
            #node_selectors:
              #Labels are user provided key value pairs
              #node_pool_label/k8s.io: example_value
            # Optional labels to apply on pods
            #labels:
              #running: airflow
            # Optional annotations to apply on pods
            #annotations:
              #iam.amazonaws.com/role: airflow
            # Optional list of kubernetes tolerations
            #tolerations:
                #- key: "group"
                  #value: "data-processing"
                  #effect: "NoExecute"
                #- key: "group"
                  #operator: "Equal",
                  #value: "data-processing",
                  #effect: "NoSchedule"
            #requests:
                #Optional amount of cpu resources requested from k8s
                #cpu: "1"
                #Optional amount of memory resource requested from k8s
                #memory: "1Gi"
            #limits:
                #Optional amount of cpu resources limit on k8s
                #cpu: "1"
                #Optional amount of memory resource limit on k8s
                #memory: "1Gi"
        # Other arbitrary configurations to use
        #custom_resource_config_name:
            # Optional labels to be put into pod node selector
            #labels:
                #Labels are user provided key value pairs
                #label_key: label_value
            #requests:
                #Optional amount of cpu resources requested from k8s
                #cpu: "1"
                #Optional amount of memory resource requested from k8s
                #memory: "1Gi"
            #limits:
                #Optional amount of cpu resources limit on k8s
                #cpu: "1"
                #Optional amount of memory resource limit on k8s
                #memory: "1Gi"

    # Optional external dependencies configuration
    #external_dependencies:
        # Can just select dag as a whole
        #- dag_id: upstream-dag
        # or detailed
        #- dag_id: another-upstream-dag
        # with specific task to wait on
        #  task_id: with-precise-task
        # Maximum time (minute) to wait for the external dag to finish before this
        # pipeline fails, the default is 1440 == 1 day
        #  timeout: 2
        # Checks if the external dag exists before waiting for it to finish. If it
        # does not exists, fail this pipeline. By default is set to true.
        #  check_existence: False
        # Time difference with the previous execution to look at (minutes),
        # the default is 0 meaning no difference
        #  execution_delta: 10
    # Optional authentication to MLflow API
    #authentication:
      # Strategy that generates the credentials, supported values are:
      # - Null
      # - GoogleOAuth2 (generating OAuth2 tokens for service account provided by
      # GOOGLE_APPLICATION_CREDENTIALS)
      # - Vars (credentials fetched from airflow Variable.get - specify variable keys,
      # matching MLflow authentication env variable names, in `params`,
      # e.g. ["MLFLOW_TRACKING_USERNAME", "MLFLOW_TRACKING_PASSWORD"])
      #type: GoogleOAuth2
      #params: []

I can run kedro airflow-k8s compile and it works
But kedro run, still give the same error.

@em-pe
Copy link
Member

em-pe commented Feb 17, 2022

@stephanecollot Thanks for reporting an issue. If I'm not wrong it's related to getindata/kedro-kubeflow#72 - @szczeles can you confirm?

@szczeles
Copy link
Contributor

@em-pe Yep, it seems so. If we apply the same trick here, the issue should be gone.

@stephanecollot As a temporary workaround, you can try adding these lines into your project's settings.py:

import sys
if 'airflow-k8s' not in sys.argv:
    DISABLE_HOOKS_FOR_PLUGINS = ("kedro-airflow-k8s",)

If your code works with this hack, it's definitely same issue as getindata/kedro-kubeflow#72

@stephanecollot
Copy link
Author

stephanecollot commented Feb 18, 2022

Thanks for your reply.
Yes I tried DISABLE_HOOKS_FOR_PLUGINS and kedro run works again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants