`NewServerConfiguration()` called twice after config reload with `-watch` causing e.g. double health probes/jobs #692

johakoch · 2023-01-31T16:08:03Z

With -watch, after a config reload,

the probe is shut down later than the server, and
I get two health probes.

Example:

server {
  hosts = ["*:8080"]
  endpoint "/" {
    proxy {
      backend = "be"
    }
  }
}

definitions {
  backend "be" {
    origin = "https://blackhole.webpagetest.org"
    beta_health {
      interval = "10s"
      timeout = "3s"  
    }
  }
}

Start Couper

INFO[0000] watching configuration file(s)                build=dev type=couper_daemon version=0 watch="map[max-retries:5 retry-delay:500ms]"
INFO[0000] ulimit: max open files: 1048576 (hard limit: 1048576)  build=dev type=couper_daemon version=0
INFO[0000] couper is serving: 0.0.0.0:8080               build=dev type=couper_daemon version=0

Normal health check:

ERRO[0000] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:20.997 total:3000.255]" type=couper_backend uid=cfcjk5beg9scfeq3hfig url="https://blackhole.webpagetest.org" version=0
WARN[0003] new health state: failing                     backend=be build=dev type=couper_daemon uid=cfcjk5beg9scfeq3hfig url="https://blackhole.webpagetest.org" version=0

Reload config:

INFO[0008] reloading couper configuration                build=dev type=couper_daemon version=0
INFO[0008] ulimit: max open files: 1048576 (hard limit: 1048576)  build=dev type=couper_daemon version=0
ERRO[0008] retry 1/5 due to listen error: listen tcp4 :8080: bind: address already in use  build=dev type=couper_daemon version=0
WARN[0008] shutting down                                 build=dev deadline=0s delay=0s type=couper_daemon version=0
INFO[0008] http: Server closed: 0.0.0.0:8080             build=dev type=couper_daemon version=0
INFO[0008] ulimit: max open files: 1048576 (hard limit: 1048576)  build=dev type=couper_daemon version=0
INFO[0008] couper is serving: 0.0.0.0:8080               build=dev type=couper_daemon version=0

Double health check log:

ERRO[0008] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:24.815 total:3000.822]" type=couper_backend uid=cfcjk7beg9scfeq3hfj0 url="https://blackhole.webpagetest.org" version=0
WARN[0011] new health state: failing                     backend=be build=dev type=couper_daemon uid=cfcjk7beg9scfeq3hfj0 url="https://blackhole.webpagetest.org" version=0
ERRO[0008] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:20.787 total:3000.621]" type=couper_backend uid=cfcjk7beg9scfeq3hfjg url="https://blackhole.webpagetest.org" version=0
WARN[0011] new health state: failing                     backend=be build=dev type=couper_daemon uid=cfcjk7beg9scfeq3hfjg url="https://blackhole.webpagetest.org" version=0

Health probe shutdown log:

WARN[0013] shutdown health probe                         backend=be build=dev type=couper_daemon url="https://blackhole.webpagetest.org" version=0

Still double health check logs:

ERRO[0021] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:20.353 total:3000.917]" type=couper_backend uid=cfcjkajeg9scfeq3hfk0 url="https://blackhole.webpagetest.org" version=0
ERRO[0024] backend error: backend error: connecting to be 'blackhole.webpagetest.org' failed: i/o timeout: new health state: unhealthy  backend=be build=dev error_type=backend_unhealthy type=couper_daemon uid=cfcjkajeg9scfeq3hfk0 url="https://blackhole.webpagetest.org" version=0
ERRO[0021] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:23.634 total:3000.681]" type=couper_backend uid=cfcjkajeg9scfeq3hfkg url="https://blackhole.webpagetest.org" version=0
ERRO[0024] backend error: backend error: connecting to be 'blackhole.webpagetest.org' failed: i/o timeout: new health state: unhealthy  backend=be build=dev error_type=backend_unhealthy type=couper_daemon uid=cfcjkajeg9scfeq3hfkg url="https://blackhole.webpagetest.org" version=0

ERRO[0034] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:27.049 total:3000.152]" type=couper_backend uid=cfcjkdreg9scfeq3hfl0 url="https://blackhole.webpagetest.org" version=0
ERRO[0034] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:21.148 total:3000.795]" type=couper_backend uid=cfcjkdreg9scfeq3hflg url="https://blackhole.webpagetest.org" version=0

ERRO[0047] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:20.394 total:3000.424]" type=couper_backend uid=cfcjkh3eg9scfeq3hfm0 url="https://blackhole.webpagetest.org" version=0
ERRO[0047] backend error: context deadline exceeded      backend=be build=dev custom="map[]" error_type=backend method=GET request="map[headers:map[user-agent:Couper / 0 health-check] host:blackhole.webpagetest.org method:GET name:health-check origin:blackhole.webpagetest.org path: proto:https]" status=0 timings="map[dns:21.6 total:3000.629]" type=couper_backend uid=cfcjkh3eg9scfeq3hfmg url="https://blackhole.webpagetest.org" version=0

The text was updated successfully, but these errors were encountered:

johakoch · 2023-02-06T15:24:38Z

go p.probe(opts.Context) is called in transport.NewProbe()
transport.NewProbe() is called in transport.NewBackend()
transport.NewBackend() is called in runtime.newBackend()
runtime.newBackend() is called in runtime.NewBackend()
runtime.NewBackend() is called for conf.Definitions.Backend in runtime.NewServerConfiguration()
runtime.NewServerConfiguration() is called in both Run.Execute() and for a reload in realmain()
Run.Execute() is also called for a reload in realmain()

With

func realmain(ctx context.Context, arguments []string) int {
...
		case _, more := <-reloadCh:
...
			_, reloadErr = runtime.NewServerConfiguration(cf, logger.WithFields(fields), tmpMemStore)
...
			execCmd, restartSignal = newRestartableCommand(ctx, cmd) // replace previous pair
			go func() {
				// logger settings update gets ignored at this point
				// have to be locked for an update, skip this feature for now
				errCh <- execCmd.Execute(args, confFile, logger)
			}()

and

func (r *Run) Execute(args Args, config *config.Couper, logEntry *logrus.Entry) error {
...
	srvConf, err := runtime.NewServerConfiguration(config, logEntry, memStore)

johakoch · 2023-02-06T15:39:18Z

Jobs are started twice after reload, too.

malud · 2023-02-16T12:29:13Z

We have a context key called request.ConfigDryRun which should prevent go-routines gets fired on object creation.

We should add this condition to all related objects like jobs or maybe jwks_url configuration_url (oidc) too.
Same pattern as in probe.go:100

johakoch · 2023-02-24T15:17:16Z

The ConfigDryRun seems to be working already. The NewServerConfiguration() called directly from realmain() has ConfigDryRun set in its context preventing the probe from running.

But there are 2 calls to execCmd.Execute() (calling NewServerConfiguration() without setting ConfigDryRun) following a reload.
The first is the already mentioned:

	reloadCh := watchConfigFiles(confFile.Files, logger, flags.FileWatchRetries, flags.FileWatchRetryDelay)
	for {
		select {
...
		case _, more := <-reloadCh:
...
			restartSignal <- struct{}{}                              // shutdown running couper
			<-errCh                                                  // drain current error due to cancel and ensure closed ports
			execCmd, restartSignal = newRestartableCommand(ctx, cmd) // replace previous pair
			go func() {
				// logger settings update gets ignored at this point
				// have to be locked for an update, skip this feature for now
				errCh <- execCmd.Execute(args, confFile, logger)
				         ^^^^^^^^^^^^^^^
			}()

The other is caused in response to an error.
(see the error in the log)

ERRO[0008] retry 1/5 due to listen error: listen tcp4 :8080: bind: address already in use  build=dev type=couper_daemon version=0

		case err = <-errCh:
			if err != nil {
				if netErr, ok := err.(*net.OpError); ok {
					if netErr.Op == "listen" && errRetries < flags.FileWatchRetries {
						errRetries++
						logger.Errorf("retry %d/%d due to listen error: %v", errRetries, flags.FileWatchRetries, netErr)

						// configuration load succeeded at this point, just restart the command
						execCmd, restartSignal = newRestartableCommand(ctx, cmd) // replace previous pair
						time.Sleep(flags.FileWatchRetryDelay)

						go func() {
							errCh <- execCmd.Execute(args, confFile, logger)
							         ^^^^^^^^^^^^^^^
						}()
						continue
...

johakoch · 2023-02-28T07:57:41Z

I guess, the first of the two mentioned calls to execCmd.Execute() comes too early. If I add

				time.Sleep(flags.FileWatchRetryDelay)

(or similar) ahead of the call, there's no retry error and so only one probe running after a reload.

johakoch · 2023-02-28T08:01:48Z

However, even if the retry is triggered differently, there should only be one probe.

johakoch added the bug Something isn't working label Jan 31, 2023

johakoch changed the title ~~Double health probes after config reload with -watch~~ NewServerConfiguration() called twice after config reload with -watch causing e.g. double health probes Feb 6, 2023

johakoch changed the title ~~NewServerConfiguration() called twice after config reload with -watch causing e.g. double health probes~~ NewServerConfiguration() called twice after config reload with -watch causing e.g. double health probes/jobs Feb 6, 2023

malud assigned alex-schneider Feb 16, 2023

malud added this to the 1.12.1 milestone Feb 16, 2023

johakoch linked a pull request Feb 24, 2023 that will close this issue

Do not start go routines on config check #728

Closed

alex-schneider removed a link to a pull request Feb 24, 2023

Do not start go routines on config check #728

Closed

alex-schneider linked a pull request Feb 24, 2023 that will close this issue

watch with double go routines #730

Merged

This was referenced Mar 3, 2023

Fix watch resource leak #736

Merged

watch with double go routines #730

Merged

johakoch closed this as completed Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`NewServerConfiguration()` called twice after config reload with `-watch` causing e.g. double health probes/jobs #692

`NewServerConfiguration()` called twice after config reload with `-watch` causing e.g. double health probes/jobs #692

johakoch commented Jan 31, 2023

johakoch commented Feb 6, 2023 •

edited

Loading

johakoch commented Feb 6, 2023

malud commented Feb 16, 2023

johakoch commented Feb 24, 2023 •

edited

Loading

johakoch commented Feb 28, 2023

johakoch commented Feb 28, 2023

NewServerConfiguration() called twice after config reload with -watch causing e.g. double health probes/jobs #692

NewServerConfiguration() called twice after config reload with -watch causing e.g. double health probes/jobs #692

Comments

johakoch commented Jan 31, 2023

johakoch commented Feb 6, 2023 • edited Loading

johakoch commented Feb 6, 2023

malud commented Feb 16, 2023

johakoch commented Feb 24, 2023 • edited Loading

johakoch commented Feb 28, 2023

johakoch commented Feb 28, 2023

`NewServerConfiguration()` called twice after config reload with `-watch` causing e.g. double health probes/jobs #692

`NewServerConfiguration()` called twice after config reload with `-watch` causing e.g. double health probes/jobs #692

johakoch commented Feb 6, 2023 •

edited

Loading

johakoch commented Feb 24, 2023 •

edited

Loading