Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make db connect more robust #5738

Merged
merged 22 commits into from
Jan 19, 2019

Conversation

pbrackin
Copy link
Contributor

Refer to #2907

(It was auto-closed by your bot, but still an issue.)

Note that your contributing instructions for running drone CLI did not work for me. It seems like the instructions may be outdated or incomplete. Here is what I got:

/home/grokman/go_projects/src/code.gitea.io/gitea % drone exec --local --build-event "pull_request" Incorrect Usage: flag provided but not defined: -local

However, I did build & run this successfully. Here is a pertinent log snippet:

2019/01/14 21:41:32 [I] Log Mode: File(Info)
2019/01/14 21:41:32 [I] XORM Log Mode: File(Info)
2019/01/14 21:41:32 [I] Cache Service Enabled
2019/01/14 21:41:32 [I] Session Service Enabled
2019/01/14 21:41:32 [I] Beginning ORM engine init...
2019/01/14 21:41:35 [I] ORM engine initialization failed. Retry #1/10...
2019/01/14 21:41:38 [I] ORM engine initialization failed. Retry #2/10...
2019/01/14 21:41:41 [I] ORM engine initialization failed. Retry #3/10...
2019/01/14 21:41:43 [I] ORM engine init success!
2019/01/14 21:41:44 [I] Git Version: 2.8.0
2019/01/14 21:41:45 [I] Run Mode: Production

I hope this is helpful to your project which - other than this issue - has worked well for me on my Synology Diskstation. If you like my change and what to pull it, but want the tests working on my side first, just guide me in there. & I'd be happy to have another go at that.

custom/conf/app.ini.sample Outdated Show resolved Hide resolved
routers/init.go Outdated Show resolved Hide resolved
@bkcsoft bkcsoft added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 15, 2019
@zeripath
Copy link
Contributor

Please also rebase on to Gitea/master. For some reason these patches aren't applying properly on to master and I suspect your master might be out of date

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more little things, mostly just for consistency and nicer logging.

There are likely errors returned from models.NewEngine that would mean we should stop earlier but we can deal with those later - so long as we get told them early (hence the logging).

routers/init.go Outdated Show resolved Hide resolved
routers/init.go Outdated
// In case of delays connecting to DB, retry connection. Eg, PGSQL in Docker Container on Synolgy
log.Info("Beginning ORM engine init...")
for i := 0; i < setting.DBConnectRetries; i++ {
if err := models.NewEngine(migrations.Migrate); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log at the start of the loop:

log.Info("ORM engine initialization (Attempt: #%d/%d).", i, setting.DBConnectRetries)

in the function turn the := into =

routers/init.go Outdated
if err := models.NewEngine(migrations.Migrate); err != nil {
log.Fatal(4, "Failed to initialize ORM engine: %v", err)
// In case of delays connecting to DB, retry connection. Eg, PGSQL in Docker Container on Synolgy
log.Info("Beginning ORM engine init...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think we should strip this out to a separate function:

func initDBEngine() (err error) {
	// With the contents of lines 61 - 72 (as amended)
	// return the last error which you then log.Fatal(4, "Failed to initialize ORM engine: %v", err)
}

Be consistent with the log message. So make this line:

log.Info("Beginning ORM engine initialization...")

routers/init.go Outdated Show resolved Hide resolved
modules/setting/setting.go Outdated Show resolved Hide resolved
routers/init.go Outdated Show resolved Hide resolved
routers/init.go Outdated Show resolved Hide resolved
@lafriks lafriks added the type/enhancement An improvement of existing functionality label Jan 15, 2019
@lafriks lafriks added this to the 1.8.0 milestone Jan 15, 2019
@pbrackin
Copy link
Contributor Author

After these changes, the new pertinent log sections looks like:

2019/01/15 16:32:32 [I] Beginning ORM engine initialization.
2019/01/15 16:32:32 [I] ORM engine initialization attempt #1/10...
2019/01/15 16:32:35 [I] ORM engine initialization attempt #2/10...
2019/01/15 16:32:38 [I] ORM engine initialization attempt #3/10...
2019/01/15 16:32:41 [I] ORM engine initialization attempt #4/10...
2019/01/15 16:32:43 [I] ORM engine initialization successful!

@codecov-io
Copy link

codecov-io commented Jan 16, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@f54e0d2). Click here to learn what that means.
The diff coverage is 60%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #5738   +/-   ##
=========================================
  Coverage          ?   37.74%           
=========================================
  Files             ?      327           
  Lines             ?    47840           
  Branches          ?        0           
=========================================
  Hits              ?    18059           
  Misses            ?    27187           
  Partials          ?     2594
Impacted Files Coverage Δ
modules/setting/setting.go 49.31% <100%> (ø)
routers/init.go 71.83% <55.55%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f54e0d2...b05babf. Read the comment docs.

routers/init.go Show resolved Hide resolved
@pbrackin
Copy link
Contributor Author

pbrackin commented Jan 17, 2019

Anything happening here? I saw there was changes requested, but they appear to be outdated.

NVM I see the one "log error" comment now. My browser must have gone stale. Just to clarify there.. the error is already being logged at the end of the connect attempt. You are saying now that you want to log the same error over and over for each attempt? As the error is very likely the same error each time, this seems redundant. If you insist on this, it may be advisable to increase the log level for the intermediate error logging. But, your project, so jsut let me know.

@zeripath
Copy link
Contributor

You should log the error before sleeping in the backoff as suggested above and I'll then lgtm it.

@pbrackin
Copy link
Contributor Author

pbrackin commented Jan 17, 2019

OK sure. So, did you want to keep it at the info log level as well? And then also did you want to remove the logging of the error at the final of all the attempts (which also would then become redundant).

@zeripath
Copy link
Contributor

Yeah keep it at Info. The point is that if you're watching your logs and you see it fail you can terminate early if it's an unexpected failure.

@pbrackin
Copy link
Contributor Author

Well, as it is, you know the last error from the last try, if the ORM fails to init at all, since the last error is printed after the new fcn returns.

But so if you want to print the error for each failed attempt & keep it all at info, it seems like you would want to remove the final info error log print after the fcn call, as it will be completely redundant. Let me know on that & I will proceed.

@zeripath
Copy link
Contributor

Hmm I'm not sure I see it log twice from the current suggested change. Your current code returns early when there's an error at the end.

If you really don't think it's helpful to log the backed off errors at Info, drop em to Debug.

I'm just cautious that we're going to make Gitea take ten times longer to fail is there's a database problem and not actually tell people what the problem is until the end of that of 10 attempts.

@pbrackin
Copy link
Contributor Author

Doesn't this make it log the final error after the new fcn call:

if err := initDBEngine(); err == nil {
                        log.Info("ORM engine initialization successful!")
                } else {
                        log.Fatal(4, "ORM engine initialization failed: %v", err)
                }

I assume fatal gets logged no matter the log setting? Hopefully I didn't screw this code up - I think it was mainly recycled from the pre-this-PR code.

Now I understand a little better where you were coming from. So for failure modes OTHER than the startup order issue, you may frustrate some real-time debuggers. I think that's a fair point. So now I am glad I was not dogmatically standing my ground there :) But yeah I think making the retry failures debug level makes the most sense. Peeps who are actively debugging will surely have their log levels turned up, and the rest will have less noisy logs. I will go ahead and do that.

@zeripath
Copy link
Contributor

Yeah log.Fatal logs no matter the setting and kills the program too.

I was meaning replace line 56 (the sleep command) with the 2 log lines plus sleep command. If you look carefully when you've backed off n -1 times and still don't get a dB on the nth, that is the last time, line 53's test evaluates true and line 54 returns the error, so on the nth time you only get the log.Fatal.

One other thing, because this code is using a go log, I'm not sure that there is a log.Debug in that logger, but if there is feel free to drop it the logging down to debug.

The other benefit of logging this stuff, even for those not debugging Gitea, is that we can find out what xorm returns when it can't connect because the db isn't ready and we can adjust this code in the future to report better information and/or not back off if the error isn't the right type. At present I at least have no idea what xorm would return, and in some ways noisy logs that annoy users means we get that information so we can make better code in future.

@pbrackin
Copy link
Contributor Author

pbrackin commented Jan 18, 2019

This is what the new output looks like, with log level set to Debug. You can see the log level of each print with the letter in brackets between the timestamp and the message.

2019/01/17 19:49:12 [I] Beginning ORM engine initialization.
2019/01/17 19:49:12 [I] ORM engine initialization attempt #1/10...
2019/01/17 19:49:12 [D] ORM engine initialization attempt #1/10 failed. Error: <nil>
2019/01/17 19:49:12 [I] Backing off for 3 seconds
2019/01/17 19:49:15 [I] ORM engine initialization attempt #2/10...
2019/01/17 19:49:15 [D] ORM engine initialization attempt #2/10 failed. Error: <nil>
2019/01/17 19:49:15 [I] Backing off for 3 seconds
2019/01/17 19:49:18 [I] ORM engine initialization attempt #3/10...
2019/01/17 19:49:18 [D] ORM engine initialization attempt #3/10 failed. Error: <nil>
2019/01/17 19:49:18 [I] Backing off for 3 seconds
2019/01/17 19:49:21 [I] ORM engine initialization attempt #4/10...
2019/01/17 19:49:23 [I] ORM engine initialization successful!

@bkcsoft bkcsoft added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Jan 18, 2019
@pbrackin
Copy link
Contributor Author

pbrackin commented Jan 18, 2019

OK incoming. Looks like I only need to set the ini value as we already were reading it as a Duration & not a constant. I tested that out locally & was able to set the Duration in the file to 2s & it actually worked, lol. Stand by.

@zeripath
Copy link
Contributor

Thanks for your work Paul sorry there's been so many little changes.

@pbrackin
Copy link
Contributor Author

Nah its nothing. Thank y'all. I do like giving back a little bit on projects that have helped me & yours is a good one. I was SO glad to get rid of GitLab, haha. Maybe c yall again u never know.... Keep up the good work!

@zeripath
Copy link
Contributor

Ok Paul, you've already got my lgtm. Need one more maintainer to look at this and say it's a good idea.

Btw Is it really the case that the xorm dB just returns a error when there's a failure to connect? That seems weird!

@bkcsoft bkcsoft added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Jan 19, 2019
@techknowlogick techknowlogick merged commit 1bb22b2 into go-gitea:master Jan 19, 2019
@pbrackin
Copy link
Contributor Author

pbrackin commented Jan 19, 2019

NOOOooooo!!! Heh. Well I appreciate the merge, but guess what? We need one more change. After your last comment I went back to look at the output before this PR & saw that it was not <nil> for the one fatal attempt. So I figured something was wrong. It turns out that we have a scoping issue. In Go, conditional blocks have their own scope. So, that is why the err wasn't showing up in the intermediate failure prints. The err var was out of scope. And it would also not show up for the fatal print case now. I just was testing a patch for this as you were merging, lol. Since this PR was merged, do you want me to make another one?

@pbrackin
Copy link
Contributor Author

BTW, with the patch, the output looks like:

2019/01/19 15:29:48 [I] Beginning ORM engine initialization.
2019/01/19 15:29:48 [I] ORM engine initialization attempt #1/10...
2019/01/19 15:29:48 [D] ORM engine initialization attempt #1/10 failed. Error: read tcp 127.0.0.1:57118->127.0.0.1:5433: read: connection reset by peer
2019/01/19 15:29:48 [I] Backing off for 2 seconds
2019/01/19 15:29:50 [I] ORM engine initialization attempt #2/10...
2019/01/19 15:29:50 [D] ORM engine initialization attempt #2/10 failed. Error: read tcp 127.0.0.1:57125->127.0.0.1:5433: read: connection reset by peer
2019/01/19 15:29:50 [I] Backing off for 2 seconds
2019/01/19 15:29:52 [I] ORM engine initialization attempt #3/10...
2019/01/19 15:29:52 [D] ORM engine initialization attempt #3/10 failed. Error: read tcp 127.0.0.1:57129->127.0.0.1:5433: read: connection reset by peer
2019/01/19 15:29:52 [I] Backing off for 2 seconds
2019/01/19 15:29:54 [I] ORM engine initialization attempt #4/10...
2019/01/19 15:29:54 [D] ORM engine initialization attempt #4/10 failed. Error: read tcp 127.0.0.1:57134->127.0.0.1:5433: read: connection reset by peer
2019/01/19 15:29:54 [I] Backing off for 2 seconds
2019/01/19 15:29:56 [I] ORM engine initialization attempt #5/10...
2019/01/19 15:29:58 [I] ORM engine initialization successful!
2019/01/19 15:29:58 [I] ORM engine had intermediate failure(s): read tcp 127.0.0.1:57134->127.0.0.1:5433: read: connection reset by peer

@zeripath
Copy link
Contributor

Just push the changes to this branch we should be able to remerge off it.

Neither of us reviewers caught it.

log.Info("Beginning ORM engine initialization.")
for i := 0; i < setting.DBConnectRetries; i++ {
log.Info("ORM engine initialization attempt #%d/%d...", i+1, setting.DBConnectRetries)
if err := models.NewEngine(migrations.Migrate); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should have been:

if err = ...

The reason the compiler didn't alarm is because there was a named err always available. I knew I never liked that feature of the language!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ur right. I did have it that way the first go around, then I changed it due to a compile error (which was for another unrelated problem). I currently have em on separate lines tho. u want me to condense it back down?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push your current changes up, and let's see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed it already. It didn't show up here I guess becz its merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I hit the "compare & new PR" button. u reckon?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i went ahead and condensed it back down to a one-liner, sans the re-declaration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK the new PR is is #5780

@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants