Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request help: visit route faster than 1 time/s would return "503 Service Unavailable" #3447

Closed
Yiyiyimu opened this issue Jan 28, 2021 · 8 comments

Comments

@Yiyiyimu
Copy link
Member

Issue description

In #3404, to test if chaos effect qps of apisix, I try to visit a route in the same frequency, calculate ingress bandwidth per second from prometheus metrics, and then compare before and after the chaos take effect.

The current frequency is 1 time/s, which is pretty slow I think and the reason is that, when frequency is larger than 1 time/s, some visit would return "503 Service Unavailable" and the test would fail. The only plugin enabled is prometheus and no request limit related plugins are enabled, so I'm not sure the reason of this.

Environment

  • apisix version (cmd: apisix version): v2.2
  • OS: (cmd: uname -a) Ubuntu 18.04
  • OpenResty / Nginx version: the one in v2.2-alpine docker
  • etcd version, if have: v3.4.13
@spacewander
Copy link
Member

Is there something interesting in the error.log? You can decrease the log level to info.

@Yiyiyimu
Copy link
Member Author

Yiyiyimu commented Jan 28, 2021

Thanks for the help @spacewander !

I try to visit a route in an interval of 0.1s for 5s, and the whole test (including first set route and one time visit route to make sure it's set) running for ~9s. Out of 50 requests, I got 38 responses, 33 of which are 503, and the rest 5 of them are 200.

The info level is a bit messy and I failed to find useful information. I upload them into gist for reference. The positive.log is the log when test running. I also upload the log after test as negative.log, it might be helpful to filter out unrelated logs.

@spacewander
Copy link
Member

There is not valuable info in the log...
Do you make your etcd clean?

@Yiyiyimu
Copy link
Member Author

Yes a fresh start etcd. Do I need to provide etcd logs?

@Yiyiyimu
Copy link
Member Author

If you got time, maybe you could reproduce the error with the chaos test branch.

Just start any APISIX instance, comment out the original test (TestGetSuccessWhenEtcdKilled) and run the following one:

func Test(t *testing.T) {
	e := httpexpect.New(t, host)

	// check if everything works
	setRoute(e, http.StatusCreated)
	getRoute(e, http.StatusOK)

	// run in background
	go func() {
		for i := 1; ; i++ {
			go func() {
				getRoute(e, http.StatusOK)
				fmt.Println("###############################################")
			}()
			time.Sleep(100 * time.Millisecond) // so the time step would be 0.1s
		}
	}()
	time.Sleep(5 * time.Second)
}

@membphis
Copy link
Member

membphis commented Feb 1, 2021

any news? @spacewander

@Yiyiyimu
Copy link
Member Author

Yiyiyimu commented Feb 1, 2021

@membphis spacewander gave me some suggestions to test on slack, but I haven't tested it. It's stuck my side 😬

@Yiyiyimu Yiyiyimu mentioned this issue Feb 2, 2021
4 tasks
@Yiyiyimu
Copy link
Member Author

Yiyiyimu commented Mar 3, 2021

Thanks to @nic-chen, the reason is that I use a dummy but existed URL foo.com as upstream. Built a local server fixed this problem

@Yiyiyimu Yiyiyimu closed this as completed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants