Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] severe perf problem in archive experiment API #6845

Closed
Bobgy opened this issue Nov 1, 2021 · 5 comments
Closed

[backend] severe perf problem in archive experiment API #6845

Bobgy opened this issue Nov 1, 2021 · 5 comments

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Nov 1, 2021

Environment

  • KFP version: master

Steps to reproduce

  1. On a KFP cluster with many runs/experiments.
  2. Use archive experiment API on any experiment, observe that the API takes >1 minute to process and may timeout + cause other queries to fail. Observed failure look like [testing] kfp-ci cluster related tests flaky #6815 (comment)

The issue caused severe flakiness in test infra: #6815, and was mitigated by skipping the archive experiment step in test script: #6843 (We should revert the #6843 after fixing the perf problem)

Expected result

There's no perf problem with experiment archive API.

Materials and Reference

This is caused by a known TODO item to improve SQL perf:

// TODO(jingzhang36): use inner join to replace nested query for better performance.

I experimented with a SQL query improvement in #6815 (comment), we still need to investigate how to turn that into our golang backend.


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@Bobgy
Copy link
Contributor Author

Bobgy commented Nov 1, 2021

/cc @capri-xiyue

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022
@lfloeer
Copy link

lfloeer commented Dec 14, 2022

We just today got hit by this issue (Kubeflow 1.3 deployment). Would really appreciate if this could get addressed in a future version. Right now, we have to tell our users to not archive experiments.

Edit: Sorry, just saw that your original issue description already included my comment below

An unfortunate side effect of the performance issue is, that since this results in an UPDATE statement, the run_details and resource_references table are locked during this operation, leading to other failed API requests, e.g. updating a single run, etc.

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Dec 14, 2022
@chensun chensun self-assigned this Mar 21, 2023
difince added a commit to difince/pipelines that referenced this issue Jun 23, 2023
Reprace the existing mysql request that use nested select, with inner join for better performance.
The fix levarage 'SQLDialect' interface, because the new request is not supported by sqllite (used for testing)
This interface bridges the difference between mysql (production) and sqlite
// (test)
Issue: kubeflow#6845

Signed-off-by: diana <difince@gmail.com>
difince added a commit to difince/pipelines that referenced this issue Jul 3, 2023
Reprace the existing mysql request that use nested select, with inner join for better performance.
The fix levarage 'SQLDialect' interface, because the new request is not supported by sqllite (used for testing)
This interface bridges the difference between mysql (production) and sqlite
// (test)
Issue: kubeflow#6845

Signed-off-by: diana <difince@gmail.com>
google-oss-prow bot pushed a commit that referenced this issue Jul 12, 2023
* fix(backend): Fix performance issue within a mysql request

Reprace the existing mysql request that use nested select, with inner join for better performance.
The fix levarage 'SQLDialect' interface, because the new request is not supported by sqllite (used for testing)
This interface bridges the difference between mysql (production) and sqlite
// (test)
Issue: #6845

Signed-off-by: diana <difince@gmail.com>

* For sqlite use UPDATE FROM to join the target table against another table in the database in order to help compute

Try to generalize the method in SQLDialect interface

Signed-off-by: diana <difince@gmail.com>

* Add unit tests

Signed-off-by: diana <difince@gmail.com>

* Replace nested query for Jobs and start using pre-comit

Signed-off-by: diana <difince@gmail.com>

* Fix: Use LEFT JOIN instead of INNER JOIN

Signed-off-by: diana <difince@gmail.com>

---------

Signed-off-by: diana <difince@gmail.com>
@gkcalat
Copy link
Member

gkcalat commented Jul 18, 2023

Fixed by @difince.

/close

@google-oss-prow
Copy link

@gkcalat: Closing this issue.

In response to this:

Fixed by @difince.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chensun pushed a commit that referenced this issue Aug 17, 2023
* fix(backend): Fix performance issue within a mysql request

Reprace the existing mysql request that use nested select, with inner join for better performance.
The fix levarage 'SQLDialect' interface, because the new request is not supported by sqllite (used for testing)
This interface bridges the difference between mysql (production) and sqlite
// (test)
Issue: #6845

Signed-off-by: diana <difince@gmail.com>

* For sqlite use UPDATE FROM to join the target table against another table in the database in order to help compute

Try to generalize the method in SQLDialect interface

Signed-off-by: diana <difince@gmail.com>

* Add unit tests

Signed-off-by: diana <difince@gmail.com>

* Replace nested query for Jobs and start using pre-comit

Signed-off-by: diana <difince@gmail.com>

* Fix: Use LEFT JOIN instead of INNER JOIN

Signed-off-by: diana <difince@gmail.com>

---------

Signed-off-by: diana <difince@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants