Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move grid_data endpoint to REST API #23417

Closed
1 of 2 tasks
bbovenzi opened this issue May 2, 2022 · 11 comments
Closed
1 of 2 tasks

Move grid_data endpoint to REST API #23417

bbovenzi opened this issue May 2, 2022 · 11 comments
Assignees
Labels
area:API Airflow's REST/HTTP API kind:feature Feature Requests

Comments

@bbovenzi
Copy link
Contributor

bbovenzi commented May 2, 2022

Description

Migrate the object/grid_data endpoint here to the REST API.

With #23415, we can simplify this endpoint too and only return the groups object.

Use case/motivation

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@bbovenzi bbovenzi added kind:feature Feature Requests area:API Airflow's REST/HTTP API labels May 2, 2022
@tirkarthi
Copy link
Contributor

@bbovenzi I can take this. Please add a permalink to the github code that you want to migrate to rest API . I couldn't find objects/tree_data endpoint in current views.

@bbovenzi bbovenzi changed the title Move tree_data endpoint to REST API Move grid_data endpoint to REST API May 9, 2022
@bbovenzi
Copy link
Contributor Author

bbovenzi commented May 9, 2022

oh yes, we renamed it to grid_data. I updated the link too.

We probably don't need to return the dag_runs part of the response and instead use the list dag runs endpoint. Message me before you start work on this, I may want to change a few other things.

@bbovenzi
Copy link
Contributor Author

Semi-related: #23772

Either before or after this issue we should try to optimize the function to generate this grid data.

@tirkarthi
Copy link
Contributor

@bbovenzi I can start working on it now. I see that you have made some optimizations as part of #23813 too. It will be helpful to have a sample response so that I can give a first draft attempt since you already mentioned dag_runs are not needed.

@bbovenzi
Copy link
Contributor Author

bbovenzi commented Sep 1, 2022

This should be an internal API endpoint, not a public one. Being in the webserver and views.py is fine for now.

@bbovenzi bbovenzi closed this as completed Sep 1, 2022
@karakanb
Copy link
Contributor

karakanb commented Sep 8, 2022

I would appreciate if this was available through the REST API as well, it is a very concise way of knowing about the historical context around task instances, and it avoids sending many separate requests to get the same information.

@bbovenzi would you oppose this being part of the public API?

@bbovenzi
Copy link
Contributor Author

@karakanb what is your use-case? I would still want to keep it separate so that we can quickly iterate on the grid view. But maybe the existing task instances REST API endpoints can be improved.

@karakanb
Copy link
Contributor

karakanb commented Sep 18, 2022

My usecase is to integrate our Airflow pipelines with some of internal tooling, and grid view has been the perfect way to do so.

Currently, since there is no grid endpoint, what I have to do is:

  • fetch all task instances between after a specific date
  • Fetch tasks, because not all tasks have had a run
  • Fetch dagruns, because not all dagruns are returned as part of the task instances

This means many requests, and especially when there are many task instances it takes many seconds to fetch the data because of additional pagination in the requests, whereas the grid endpoint can do this very quickly. Increasing the page sizes don't help much either for some reason, which means we are stuck with very slow loading times compared to the grid view data.

I might also be doing something wrong, therefore if you have recommendations please let me know.

@potiuk
Copy link
Member

potiuk commented Sep 19, 2022

I think "Generic" bulk retrieval of Airflow tasks (not optimized for UI but for bulk retrieval of data) could be added. Maybe you would like to design and contribute such an endpoint @karakanb - you seem to know hat you are doing with Python, adding yet another API endpoint should be easy following the exisitng ones and if you get it "generic" enough, it would be useful for others too.

@karakanb
Copy link
Contributor

I could try to do that, for sure; however, it'd be pretty much the same as the grid data endpoint, no?

alternatively, we could expand the /dags/~/dagRuns/list endpoint with additional parameters to include the tasks and the task instances as well, would that be something that goes along the lines of what you mentioned? given that that endpoint is already a batch one, we could utilize it for such purposes maybe.

@potiuk
Copy link
Member

potiuk commented Sep 19, 2022

I could try to do that, for sure; however, it'd be pretty much the same as the grid data endpoint, no?

The difference is that we might want to change grid data endpoint in a backwards-incompatible way in the future, to serve Airflow UI better. It might have some optimisations and more (or less) data retrieved to speed-UI responsiveness. And eventually we might make it an async endpoint as well.

Contrary to that I am talking about "bullk "retrieval of the data" that should be stable. I think extending "dagRuns" is not great because it follows REST semantics, so in principle it only shoudl return dagRuns. I think having separate endpoint to retrieve "joined data" might be better. But I also know @bbovenzi and others discussed about the APIs we need so maybe they have other ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants