Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check error in OpenSQLConnection #588

Conversation

andreyvelich
Copy link
Member

@andreyvelich andreyvelich commented May 25, 2019

I faced with the same error as here: #340.
Problem was with my Coredns in Kubernetes cluster. After I changed pod network from Flannel to Calico, it was fixed.
I didn't see any logs from the katib-manager and it was hard to found this problem.
I think, we should print these error messages to the user, then it will be easier to find the problem.

/cc @gaocegege @johnugeorge @hougangliu


This change is Reviewable

@@ -111,6 +111,9 @@ func openSQLConn(driverName string, dataSourceName string, interval time.Duratio
if err = db.Ping(); err == nil {
return db, nil
}
klog.Infof("Ping to Katib db failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use klog.Errorf here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@@ -95,6 +95,9 @@ func openSQLConn(driverName string, dataSourceName string, interval time.Duratio
if err = db.Ping(); err == nil {
return db, nil
}
klog.Infof("Ping to Katib db failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return nil, err here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaocegege This is in a for-select loop. Shouldn't it wait till timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh gotcha, yeah we should wait instead of returning an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnugeorge I didn't check it, because after the timeout my manager container was automatically restarting.

@andreyvelich
Copy link
Member Author

/retest

@johnugeorge
Copy link
Member

/lgtm

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 4f678e2 into kubeflow:master May 28, 2019
@andreyvelich andreyvelich deleted the v1alpha2-check-error-in-db-connection branch September 27, 2021 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants