Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend): Fix performance issue within a mysql request #9680

Merged
merged 5 commits into from
Jul 12, 2023

Conversation

difince
Copy link
Member

@difince difince commented Jun 23, 2023

One of the requests used for Archiving Experiments is rewritten to use inner join for better performance, instead of using nested select. The fix leverage 'SQLDialect' interface, because the new Mysql request is not supported by SQLite Dialect (used for testing).
There are other MySQL requests that use nested requests in Archive Experiment, but only the most problematic one reported by this comment (bobgy) is fixed in the PR.
The other requests are not optimized, as the resource_reference table will be deprecated soon - once v2beta1 is available.

Issue: #6845

Note: Archive experiments need to be enabled back in test infra. The change done in this PR needs to be reverted once this PR got merged.

Description of your changes:

Checklist:

Reprace the existing mysql request that use nested select, with inner join for better performance.
The fix levarage 'SQLDialect' interface, because the new request is not supported by sqllite (used for testing)
This interface bridges the difference between mysql (production) and sqlite
// (test)
Issue: kubeflow#6845

Signed-off-by: diana <difince@gmail.com>
…able in the database in order to help compute

Try to generalize the method in SQLDialect interface

Signed-off-by: diana <difince@gmail.com>
Signed-off-by: diana <difince@gmail.com>
Copy link
Member Author

@difince difince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another commit - to add unit tests

Copy link
Member

@gkcalat gkcalat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @difince for the contribution!
I left a few comments.

@gkcalat gkcalat self-assigned this Jul 6, 2023
Signed-off-by: diana <difince@gmail.com>
@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Jul 7, 2023
@@ -88,6 +91,11 @@ func (d MySQLDialect) IsDuplicateError(err error) bool {
return ok && sqlError.Number == mysqlerr.ER_DUP_ENTRY
}

// UpdateFromOrJoin TODO(gkcalat): deprecate resource_references table once we migration to v2beta1 is available.
func (d MySQLDialect) UpdateWithJointOrFrom(targetTable, joinTable, setClause, joinClause, whereClause string) string {
return fmt.Sprintf("UPDATE %s JOIN %s ON %s SET %s WHERE %s", targetTable, joinTable, joinClause, setClause, whereClause)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this actually should be LEFT JOIN because we are replacing the OR op in the WHERE clause with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Thanks for catching this!

Copy link
Member Author

@difince difince Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-opening this conversation...
I just realized that the below statement is not correct.

I just realized that this actually should be LEFT JOIN because we are replacing the OR op in the WHERE clause with this.

This is the original request:
UPDATE run_details SET StorageState = 'ARCHIVED' WHERE StorageState <> 'ARCHIVED' AND UUID in (SELECT ResourceUUID FROM resource_references as rf WHERE (rf.ReferenceUUID = "d86e5315-09d7-4c8a-982e-4de004b2e4b6" AND rf.ResourceType = 'Run' AND rf.ReferenceType = 'Experiment')) OR ExperimentUUID = 'd86e5315-09d7-4c8a-982e-4de004b2e4b6';
created by the following code.

	// TODO(gkcalat): deprecate resource_references table once we migration to v2beta1 is available.
	// TODO(jingzhang36): use inner join to replace nested query for better performance.
	filteredRunsSql, filteredRunsArgs, err := sq.Select("ResourceUUID").
		From("resource_references as rf").
		Where(sq.And{
			sq.Eq{"rf.ResourceType": model.RunResourceType},
			sq.Eq{"rf.ReferenceUUID": expId},
			sq.Eq{"rf.ReferenceType": model.ExperimentResourceType},
		}).ToSql()
	if err != nil {
		return util.NewInternalServerError(err,
			"Failed to create query to filter the runs in an experiment %s. error: '%v'", expId, err.Error())
	}
	updateRunsSql, updateRunsArgs, err := sq.
		Update("run_details").
		SetMap(sq.Eq{
			"StorageState": model.StorageStateArchived.ToString(),
		}).
		Where(sq.NotEq{"StorageState": model.StorageStateArchived.ToString()}).
		Where(fmt.Sprintf("UUID in (%s) OR ExperimentUUID = '%s'", filteredRunsSql, expId), filteredRunsArgs...).
		ToSql()
	if err != nil {
		return util.NewInternalServerError(err,
			"Failed to create query to archive the runs in an experiment %s. error: '%v'", expId, err.Error())
	}

This is the new optimized MySQL query:

UPDATE run_details JOIN resource_references ON run_details.UUID = resource_references.ResourceUUID SET StorageState = "STORAGESTATE_ARCHIVED" WHERE resource_references.ResourceType = "RUN" AND resource_references.ReferenceUUID = "<experiment_id>" AND resource_references.ReferenceType = "Experiment"

By adding LEFT JOIN we won't directly achieve OR ExperimentUUID = 'experiment_ID' from the original query, because:

  1. In general, LEFT JOIN intends to return the entire LEFT table (we just do not want this) and
  2. The WHERE clause of the new query is too specific: WHERE resource_references.ResourceType = "RUN" AND resource_references.ReferenceUUID = "<experiment_id>" AND resource_references.ReferenceType = "Experiment".
    Even if we add an OR ExperimentUUID = 'experiment_ID' statement it won't work because of how we JOIN both Tables. Plus there is already a dedicated query that updates all run_details based just on experimentUUID see (ref A) :
    updateRunsWithExperimentUUIDSql, updateRunsWithExperimentUUIDArgs, err := sq.
    Update("run_details").
    SetMap(sq.Eq{
    "StorageState": model.StorageStateArchived.ToString(),
    }).
    Where(sq.Eq{"ExperimentUUID": expId}).
    Where(sq.NotEq{"StorageState": model.StorageStateArchived.ToString()}).
    ToSql()
    if err != nil {

In short :), After I took a look at how Run and Jobs are created IMO, it would be a bug if we have in the database run_details record without a corresponding resource_reference record with ResourceType= RUN and ReferenceType=Experiment. In other words, Am I right that there shouldn't be existing Runs that don't belong to some Experiment?

  • So If this is the case, I suggest keeping the INNER JOIN version of the optimized request and even removing the (ref A) - IMO it has been added for "just in case"

  • If I am missing something and that is not the case - I would keep the optimized INNER JOIN request + ref A ( For Runs) and write appropriate ref B for Jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @difince for a thorough analysis. I agree.

Signed-off-by: diana <difince@gmail.com>
@Linchin
Copy link
Contributor

Linchin commented Jul 12, 2023

Thank you @difince!
/lgtm
/approve
closes #6845

@google-oss-prow google-oss-prow bot added the lgtm label Jul 12, 2023
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Linchin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 213dd5a into kubeflow:master Jul 12, 2023
1 check passed
@difince
Copy link
Member Author

difince commented Jul 13, 2023

Hi @Linchin I haven't removed the "LEFT" from the JOIN as I expect some comments :)
Even with the LEFT JOIN the query is still doing the right thing - the performance is much worse. So, I will open a new PR to remove LEFT JOIN and instead iser INNER!

@gkcalat
Copy link
Member

gkcalat commented Jul 13, 2023

Hi @Linchin I haven't removed the "LEFT" from the JOIN as I expect some comments :) Even with the LEFT JOIN the query is still doing the right thing - the performance is much worse. So, I will open a new PR to remove LEFT JOIN and instead iser INNER!

The PR was merged by accident. In the next PR, could you please replace the join with INNER JOIN and add a TODO with a note to switch to filtering by experiment_id once we deprecate resource_references?

Thank you!

chensun pushed a commit that referenced this pull request Aug 17, 2023
* fix(backend): Fix performance issue within a mysql request

Reprace the existing mysql request that use nested select, with inner join for better performance.
The fix levarage 'SQLDialect' interface, because the new request is not supported by sqllite (used for testing)
This interface bridges the difference between mysql (production) and sqlite
// (test)
Issue: #6845

Signed-off-by: diana <difince@gmail.com>

* For sqlite use UPDATE FROM to join the target table against another table in the database in order to help compute

Try to generalize the method in SQLDialect interface

Signed-off-by: diana <difince@gmail.com>

* Add unit tests

Signed-off-by: diana <difince@gmail.com>

* Replace nested query for Jobs and start using pre-comit

Signed-off-by: diana <difince@gmail.com>

* Fix: Use LEFT JOIN instead of INNER JOIN

Signed-off-by: diana <difince@gmail.com>

---------

Signed-off-by: diana <difince@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants