Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal String Instead of List of Lists in Template_Params Causes Unexpected Behavior in CustomDataset Constructor #6

Open
hugocool opened this issue Dec 12, 2023 · 1 comment

Comments

@hugocool
Copy link

We are experiencing an issue where the literal string '[[ data_ids ]]' is being passed to the constructor of our dataset, instead of the expected list of lists. This problem occurs in the context of a FastAPI route with a Kedro session and a custom dataset configuration using a catalog entry.

Here are the details of the issue and our current workaround:

  1. FastAPI Route with Kedro Session:

    In our FastAPI application, we have a route that uses a Kedro session to run a task. The data_ids parameter, which is intended to be a list of integers, is being incorrectly passed as a string. Here's the relevant part of the route:

    # ... [previous code for route setup] ...
    
    @app.post("/api/v1/data", response_model=OutputData)
    async def get_output_data(input_data: InputData):
        data_ids = input_data.data_ids  # data_ids is expected to be a list of ints
    
        # ... [additional code and logging] ...
    
        output = session.run(
            name="web_api", 
            inputs={"input_data": input_data},
            template_params={"data_ids": data_ids},
        )
  2. Catalog Entry Issue and Workaround:

    The issue also manifests in our catalog entry configuration for a custom dataset:

    result:
      type: performance_optimisation_engine.extras.datasets.web_api_dataset.ResultDataset
      credentials: mysql
      data_ids: [[ data_ids ]]

    With this setup, the string '[[ data_ids ]]' is passed to the constructor instead of the list of lists. To address this, we modified the catalog entry to pass data_ids as a string explicitly:

    result:
      type: performance_optimisation_engine.extras.datasets.web_api_dataset.ResultDataset
      credentials: mysql
      data_ids: '[[ data_ids ]]'

    Then, in the constructor of our dataset, we convert this string back into a list of lists:

    s = str(self.data_ids).replace("'", '"')
    self.data_ids = json.loads(s)

While this workaround is functional, it's not ideal. We suspect this might be a bug or at least a feature that requires clarification in the documentation. Any assistance in resolving this issue more elegantly would be greatly appreciated.

Thank you for your time and consideration.

@takikadiri
Copy link
Owner

takikadiri commented Dec 13, 2023

Thank you for raising this issue. This highlights an important point in the catalog's template rendering

Basically the template_params is used at each iteration/request for rendering any Jinja template that is present in your dataset's attributes and having [[ ]] as expression. We don't use the default Jinja block start/ending {{ }} because they are interpreted/rendered by kedro (ConfigLoader and TemplatedConfigLoader) at catalog creation. The new OmegaConfigLoader doesn't use Jinja

At iteration/request time the datasets are already initialized and materialized as python object, kedro-boot isn't really rendering a yaml catalog, it is instead alterating/rendering the attributes of an already initialized datasets. Concretely at request time you already have a result dataset/object with data_ids = [[ data_ids ]] that will be rendered before running the pipeline.

Since we're rendering the python object directly (we don't render a yaml), kedro-boot recursively render every string attributes (handle also strings that are imbriqued in a list or dict) and every Path attributes.

We have two problems here. Each one is handled with a line of your workaround

1. [[ expression ]] sometimes ignored by kedro-boot renderer
In your use case something weird happened, your self.data_ids attribute was not a string nor a Path. It was initialized as a list in your dataset object, so kedro-boot ignore it. This is because in the loading of catalog.yaml the data_ids attribute was casted as list [ ... ]. So when you added a quote to the data_ids value, it was casted as string, and correctly interpreted by kedro-boot, but you was forced to fix the format with s = str(self.data_ids).replace("'", '"').

This is clearly an edge case of having [[ ]] as block start/ending of the Jinja expression. This lead to inconsistent behaviors when the template (dataset value) contain only the jinja expression dataset_attribute: [[ expression ]]. We should consider switching to another characters. I propose {[ expression ]}. What do you think ?

2. Casting a non-string templatized dataset attribute
Jinja rendering always return a string. So in the case of a non-string attribute, you will always need to cast your expression. In your example you can cast it with self.data_ids = json.loads(s) or ast.literal_eval(s).
I don't see this as a real problem as it's feel natural to cast somewhere a value that was being received as a text from the web server.

Let me know what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants