-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query core-command device taking long time on first execution #2526
Comments
I think this is a known complex issue with how we are bootstrapping the system. We touched on this issue in PR #2450. The fix for this is something we would possibly want to tackle in the future since a true fix would require some rework in a couple of modules. @brandonforster can you verify this is the same underlying issue we ran into in the past? |
Yes, that's indeed the problem. Accordingly, @jinlinGuan, the correct fix for blackbox tests is to adjust the test timeout relative to what we expect initialization time to be. In this case, EdgeX has its client timeout configured to be 45000 ms, or 45 seconds. 800 ms is entirely too short and I expect this test will keep failing. |
@jinlinGuan , is this solution acceptable and if so would you like this introduced in the Blackbox tests or is this something you or the QA team will handle? |
Adjusting the timeout is meaningless for the performance testing, so let the testcase keep failing until the fix is ready. |
This is the fix. We expect the first request to a service to take significantly longer than all the rest, and this is what we are going to ship in Geneva. |
The true fix for this is to either:
As a short term fix, we can simple perform an eager initialization within the applications. This solution was originally explored and provided a quick fix to the problem. However, this is something that @michaelestrin highly advice against as it does not address the root issue. See the comments in PR #2450 |
Some thoughts. The DI implementation already provides for both eager and lazy initialization. In this particular use case, eager isn't possible unless you know the array of clients you'll be communicating with. #2450 Test failure appears to be on ARM. ARM is s--l---o----w. Expecting to interject a Consul lookup in the first call on ARM will likely ALWAYS exceed the 800ms threshold. Performance testing ARM? Pffft. (I guess you probably do want to quantify how slowly it runs.) Symptom fix -- which I don't recommend -- would be to prime the pump in tests by forcing communication with each client (which would incur the Consul round-trip overhead) prior to executing the actual tests. Root cause fix would be to do the Consul round-trip during service bootstrap. "Good night and good luck." |
Thanks for all the investigations and explanation. I suggest to do the following things to resolve this issue:
|
@cloudxxx8 Yes, with one caveat: this is not just command, this is all services. The first call to any service should be expected to take a significantly longer amount of time than subsequent calls due to the Consul round trip. |
I see. In this case, may we add this statement into the general service page? |
@cloudxxx8 , I dug into the Client handling and now see the issue with lazy initialization of the URL. In Fuji, if I recall correctly, it used the URL created here initially: The issue is that we are throwing away that initial URL (which we have configured correctly). @tsconn23, @jpwhitemn, we should discuss this more in this week's Core WG. |
Ok, here is what Fuji is doing. In both Fuji and Geneva the initial URL is only used when not monitoring, i.e. not using the Registry. |
fixed by #2595 |
🐞 Bug Report
Affected Services
The issue is located in:core-command
Description and Minimal Reproduction
curl http://localhost:48082/api/v1/device
was stuck(#2399), and that issue is fixed. Now it can query properly, but it's taking long time(10+s) on first run, which always causes blackbox-testing failure because the response time is expected to be less than 800ms.🔥 Exception or Error
🌍 Your Environment
Deployment Environment:
ARM64, AMD64
EdgeX Version:
Master
Anything else relevant?
The text was updated successfully, but these errors were encountered: