Make the APIDataset.save stores the response received #748

npfp · 2024-07-03T10:19:35Z

Description

When running a POST or PUT request with the APIDataset, the response is currently lost while it would be useful to store it.

Context

We rely a lot on the APIDataset to fetch but also to save data to external API. Keeping tracks or the answer is then really important to us.

Possible Implementation

We built a custom APIDataset that takes a filepath argument. If this argument is not None, a TextDataset(filepath=filepath) is created and is called in the _execute_save_request:

self.local_dataset.save(response.text)

Possible Alternatives

Not found any other.

I would then like to make a PR with this proposed change but before making the actual PR, I wanted to double check with you that this feature would be of interest for the community.

The text was updated successfully, but these errors were encountered:

datajoely · 2024-07-03T10:24:21Z

This is a great point, I think the other slightly more robust way to do this would is to add a logging.info(response.text) call so that this sort of stuff can be picked up within an observability stack

npfp · 2024-07-03T10:31:31Z

Ah right I didn't think to this way.

In our case, some of the external endpoints send back to us an id and some pieces of information that we use as starting point of another pipeline in a subsequent run so the idea of storing the response.

datajoely · 2024-07-03T10:34:30Z

That makes sense, I think the ambition is right, we should store this. I guess this was built under the assumption we only cared about 200 responses, but POST functionality was added by the community later and this is a key point.

npfp · 2024-07-03T10:57:50Z

Great, I will make a PR then.

datajoely · 2024-07-03T11:12:01Z

before you do any work - I'd maybe like to get some other contributors opinion! @noklam @merelcht any thoughts here?

merelcht · 2024-07-03T15:25:36Z

It makes total sense to me to save the response. I wouldn't save it as an other type of dataset though (e.g. TextDataset mentioned in the description), but rather directly save it to a file format that makes sense. The main reason for that is that to me it feels odd to have one type of dataset be the return type of another dataset.

npfp · 2024-07-04T06:52:44Z

@merelcht I see, regarding the use of TextDataset it was really to reduce the maintenance/test burden by relying on a maintained dataset while keeping the interface simple.

When you say

directly save it to a file format that makes sense

do you mean use directly the open context manager/write operation?

MinuraPunchihewa · 2024-10-22T16:56:21Z

@merelcht I am happy to work on this unless @npfp is already in the process.

merelcht · 2024-10-23T09:45:37Z

Sure go for it @MinuraPunchihewa. I see I never got back to the question:

do you mean use directly the open context manager/write operation?

Yes! My point was mainly that I wouldn't import another kedro dataset like TextDataset and return that, but indeed directly use the open context manager and write operation.

npfp · 2024-10-23T10:11:36Z

For your information, I implemented a custom APIDataset with a nested dataset to store the response. I got inspired by the partitioned dataset.

I didn't go for the open context manager because I wanted to have flexibility of where I would store the responses: locally, or in s3, in json or in raw text, in a versioned way or not, etc.

Also I decided to implement the load method when using POST, PUT mode: this method would just load the response that has been saved, so it can be processed downstream in the pipeline.

I'm aware this design is arguable.

What I could do is to open a PR as the code is pretty much ready and we discuss this there? I could do the PR by the beginning of next week. If it doesn't fit to the kedro spirit then we can just close it.

@merelcht @MinuraPunchihewa What do you think?

MinuraPunchihewa · 2024-10-23T10:14:37Z

@npfp Please go ahead and open your PR.

datajoely · 2024-10-23T11:54:58Z

Thanks - @npfp please do! It will be a great place to co-design the best solution possible 💪

npfp · 2024-10-23T12:45:06Z

This is pretty much the design we used for our custom dataset:

#905

astrojuanlu transferred this issue from kedro-org/kedro Jul 3, 2024

npfp mentioned this issue Oct 23, 2024

feat(datasets): add wrapped dataset to store the response from a POST/PUT request #905

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the APIDataset.save stores the response received #748

Make the APIDataset.save stores the response received #748

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

merelcht commented Jul 3, 2024

npfp commented Jul 4, 2024 •

edited

Loading

MinuraPunchihewa commented Oct 22, 2024

merelcht commented Oct 23, 2024

npfp commented Oct 23, 2024 •

edited

Loading

MinuraPunchihewa commented Oct 23, 2024

datajoely commented Oct 23, 2024

npfp commented Oct 23, 2024

Make the APIDataset.save stores the response received #748

Make the APIDataset.save stores the response received #748

Comments

npfp commented Jul 3, 2024

Description

Context

Possible Implementation

Possible Alternatives

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

npfp commented Jul 3, 2024

datajoely commented Jul 3, 2024

merelcht commented Jul 3, 2024

npfp commented Jul 4, 2024 • edited Loading

MinuraPunchihewa commented Oct 22, 2024

merelcht commented Oct 23, 2024

npfp commented Oct 23, 2024 • edited Loading

MinuraPunchihewa commented Oct 23, 2024

datajoely commented Oct 23, 2024

npfp commented Oct 23, 2024

npfp commented Jul 4, 2024 •

edited

Loading

npfp commented Oct 23, 2024 •

edited

Loading