-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System performance degrades with large numbers of runs #2071
Comments
We faced the same issue, |
More details after explain the query statement, same as @strangemonad-faire reported, query from
But even if the |
@IronPan @gaoning777 Do you guys have time to address this issue? |
FWIW, for the time being, I'm working around the issue by manually making sure old workflows are deleted from k8s via
|
@strangemonad-faire We also have considered the solution that delete MySQL records manually. But sadly this isn't a long-term way. It's just a workaround. |
Just FYI, we have made some small changes try to optimize The basic idea is remove useless query and columns, e.g. we removed:
You could see the details in here (ignore @strangemonad-faire If you have time could you try this optimization see if can help you. |
Any Kubeflow maintainer could address this issue? @Ark-kun @gaoning777 @IronPan @jingzhang36 @Bobgy |
@xiaogaozi @strangemonad-faire Thank you a lot for dive into this and found the issue. I guess the part that joining the metrics and resource reference is the root cause. These information is still needed I believe. We'll take a look and get back soon. |
It looks the PR 2559 fixed this issue. Feel free to reopen if not. Also welcome PR. |
Thank you for the elegant fix! We'll test later to verify the change. I still have one question about if we can delete
before after |
Signed-off-by: Suresh Nakkeran <suresh.n@ideas2it.com>
What happened:
On systems with lots of runs, listing experiments becomes unbearably slow.
What did you expect to happen:
The performance remains constant
What steps did you take:
I enabled slow query logging and identified the culprit query. Change #1836 attempted to address the issue but it doesn't.
Here's an explain statement from our cluster that contains ~70k resource_references:
Anything else you would like to add:
I'm not familiar enough with gorm to know how to force it to generate a more efficient query.
The text was updated successfully, but these errors were encountered: