You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing an issue where the literal string '[[ data_ids ]]' is being passed to the constructor of our dataset, instead of the expected list of lists. This problem occurs in the context of a FastAPI route with a Kedro session and a custom dataset configuration using a catalog entry.
Here are the details of the issue and our current workaround:
FastAPI Route with Kedro Session:
In our FastAPI application, we have a route that uses a Kedro session to run a task. The data_ids parameter, which is intended to be a list of integers, is being incorrectly passed as a string. Here's the relevant part of the route:
# ... [previous code for route setup] ...@app.post("/api/v1/data", response_model=OutputData)asyncdefget_output_data(input_data: InputData):
data_ids=input_data.data_ids# data_ids is expected to be a list of ints# ... [additional code and logging] ...output=session.run(
name="web_api",
inputs={"input_data": input_data},
template_params={"data_ids": data_ids},
)
Catalog Entry Issue and Workaround:
The issue also manifests in our catalog entry configuration for a custom dataset:
With this setup, the string '[[ data_ids ]]' is passed to the constructor instead of the list of lists. To address this, we modified the catalog entry to pass data_ids as a string explicitly:
While this workaround is functional, it's not ideal. We suspect this might be a bug or at least a feature that requires clarification in the documentation. Any assistance in resolving this issue more elegantly would be greatly appreciated.
Thank you for your time and consideration.
The text was updated successfully, but these errors were encountered:
Thank you for raising this issue. This highlights an important point in the catalog's template rendering
Basically the template_params is used at each iteration/request for rendering any Jinja template that is present in your dataset's attributes and having [[ ]] as expression. We don't use the default Jinja block start/ending {{ }} because they are interpreted/rendered by kedro (ConfigLoader and TemplatedConfigLoader) at catalog creation. The new OmegaConfigLoader doesn't use Jinja
At iteration/request time the datasets are already initialized and materialized as python object, kedro-boot isn't really rendering a yaml catalog, it is instead alterating/rendering the attributes of an already initialized datasets. Concretely at request time you already have a result dataset/object with data_ids = [[ data_ids ]] that will be rendered before running the pipeline.
Since we're rendering the python object directly (we don't render a yaml), kedro-boot recursively render every string attributes (handle also strings that are imbriqued in a list or dict) and every Path attributes.
We have two problems here. Each one is handled with a line of your workaround
1. [[ expression ]] sometimes ignored by kedro-boot renderer
In your use case something weird happened, your self.data_ids attribute was not a string nor a Path. It was initialized as a list in your dataset object, so kedro-boot ignore it. This is because in the loading of catalog.yaml the data_ids attribute was casted as list [ ... ]. So when you added a quote to the data_ids value, it was casted as string, and correctly interpreted by kedro-boot, but you was forced to fix the format with s = str(self.data_ids).replace("'", '"').
This is clearly an edge case of having [[ ]] as block start/ending of the Jinja expression. This lead to inconsistent behaviors when the template (dataset value) contain only the jinja expression dataset_attribute: [[ expression ]]. We should consider switching to another characters. I propose {[ expression ]}. What do you think ?
2. Casting a non-string templatized dataset attribute
Jinja rendering always return a string. So in the case of a non-string attribute, you will always need to cast your expression. In your example you can cast it with self.data_ids = json.loads(s) or ast.literal_eval(s).
I don't see this as a real problem as it's feel natural to cast somewhere a value that was being received as a text from the web server.
We are experiencing an issue where the literal string
'[[ data_ids ]]'
is being passed to the constructor of our dataset, instead of the expected list of lists. This problem occurs in the context of a FastAPI route with a Kedro session and a custom dataset configuration using a catalog entry.Here are the details of the issue and our current workaround:
FastAPI Route with Kedro Session:
In our FastAPI application, we have a route that uses a Kedro session to run a task. The
data_ids
parameter, which is intended to be a list of integers, is being incorrectly passed as a string. Here's the relevant part of the route:Catalog Entry Issue and Workaround:
The issue also manifests in our catalog entry configuration for a custom dataset:
With this setup, the string
'[[ data_ids ]]'
is passed to the constructor instead of the list of lists. To address this, we modified the catalog entry to passdata_ids
as a string explicitly:Then, in the constructor of our dataset, we convert this string back into a list of lists:
While this workaround is functional, it's not ideal. We suspect this might be a bug or at least a feature that requires clarification in the documentation. Any assistance in resolving this issue more elegantly would be greatly appreciated.
Thank you for your time and consideration.
The text was updated successfully, but these errors were encountered: