-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discover] Unskip functional tests for field visualize buttons #62614
[Discover] Unskip functional tests for field visualize buttons #62614
Conversation
@elasticmachine merge upstream |
…-visualize-functional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for unskipping these!
await PageObjects.discover.expectMissingFieldListItemVisualize('bytes'); | ||
await retry.try(async () => { | ||
await setDiscoverTimeRange(); | ||
const hasNoResults = await PageObjects.discover.hasNoResults(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generally try to avoid checking for something to not exist since it takes a timeout of 10 seconds or so. Compared to checking for something that should exist like the hit count. Don't change anything yet. I'm going to run these tests locally and see if I have a suggestion for a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code as it retries a couple of times to make sure it's not on the "no results" page (it's not the default timeout, but a 2500ms timeout) takes about 3 seconds;
[11:39:34.333150000] │ debg TestSubjects.exists(discoverNoResults)
[11:39:34.393075600] │ debg Find.existsByDisplayedByCssSelector('[data-test-subj="discoverNoResults"]') with timeout=2500
[11:39:34.704008100] │ debg --- retry.tryForTime error: [data-test-subj="discoverNoResults"] is not displayed
[11:39:35.236323000] │ debg --- retry.tryForTime failed again with the same message...
[11:39:35.761950200] │ debg --- retry.tryForTime failed again with the same message...
[11:39:36.293043400] │ debg --- retry.tryForTime failed again with the same message...
[11:39:36.829574800] │ debg --- retry.tryForTime failed again with the same message...
[11:39:37.354057600] │ debg TestSubjects.click(field-bytes)
vs getting the hitCount and verifying it's > 0 takes about .2 seconds
[11:54:10.954650700] │ debg TestSubjects.getVisibleText(discoverQueryHits)
[11:54:11.002496700] │ debg TestSubjects.find(discoverQueryHits)
[11:54:11.052031400] │ debg Find.findByCssSelector('[data-test-subj="discoverQueryHits"]') with timeout=10000
[11:54:11.113423700] │ debg TestSubjects.click(field-bytes)
@@ -187,8 +187,9 @@ export default function({ getPageObjects, getService }: FtrProviderContext) {
await PageObjects.common.navigateToApp('discover');
await retry.try(async () => {
await setDiscoverTimeRange();
- const hasNoResults = await PageObjects.discover.hasNoResults();
- expect(hasNoResults).to.be(false);
+ const hitCount = await PageObjects.discover.getHitCount();
+ // eslint-disable-next-line radix
+ expect(parseInt(hitCount)).to.be.greaterThan(0);
await PageObjects.discover.clickFieldListItem('bytes');
await PageObjects.discover.expectMissingFieldListItemVisualize('bytes');
@@ -281,8 +282,10 @@ export default function({ getPageObjects, getService }: FtrProviderContext) {
await PageObjects.common.navigateToApp('discover');
await retry.try(async () => {
await setDiscoverTimeRange();
- const hasNoResults = await PageObjects.discover.hasNoResults();
- expect(hasNoResults).to.be(false);
+ const hitCount = await PageObjects.discover.getHitCount();
+ // eslint-disable-next-line radix
+ expect(parseInt(hitCount)).to.be.greaterThan(0);
+
await PageObjects.discover.clickFieldListItem('bytes');
await PageObjects.discover.expectMissingFieldListItemVisualize('bytes');
});
@@ -362,8 +365,9 @@ export default function({ getPageObjects, getService }: FtrProviderContext) {
await PageObjects.common.navigateToApp('discover');
await retry.try(async () => {
await setDiscoverTimeRange();
- const hasNoResults = await PageObjects.discover.hasNoResults();
- expect(hasNoResults).to.be(false);
+ const hitCount = await PageObjects.discover.getHitCount();
+ // eslint-disable-next-line radix
+ expect(parseInt(hitCount)).to.be.greaterThan(0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I just realized another potential issue with this change.
In other Discover tests we've used a retry only around getting the hit count and comparing it to the expected value. We didn't include setting the time range in the retry because each time you set the timepicker it's going to reload the page, and it's the page loading we're waiting for with the retry.
From the failing test issue you said
"the screenshot of the failed test is telling me, no data available, expand your time range. that's odd"
Did the screenshot show the expected start and end dates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @LeeDr , back today, I'll soon provide feeback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run a similar test suite in OSS for debugging the issue, it's wasn't flaky there:
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/339/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dear @LeeDr, wonder how to proceed here?
Maybe switch to const hitCount = await PageObjects.discover.getHitCount();
, since this fixes the test, and open another issue because of the flaky data fetching to investigate?
Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it pretty concerning that the screenshot shows the correct dates in the timepicker and no results?!?! It could still just be a timing issue that the results just haven't come back in the response yet, but the pink loading bar isn't there either so that doesn't feel right.
I'm looking at the flaky-test-suite-runner output now....
FYI, here's an example of a test where we only put the getHitCount() in the retry because it's waiting for the response from Elasticsearch and for the page to load that data; https://github.com/elastic/kibana/blob/master/test/functional/apps/discover/_discover.js#L74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LeeDr I've adapted the code, removing setDiscoverTimerRange() of try.retry
, now the flaky suite is flaky (1 of 44)
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/367/
can I search the logs on server? because in Jenkins it's hard to search the logs, it says, no test failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @lukasolson (any other thoughts on this?)
I had a couple of thoughts on debugging this while running the test locally.
We could turn on Elasticsearch slowlogs on both the logstash-*
and .async-search
indices. I don't see that we've done that in any existing tests yet. It's a per-index setting. Seems like it would have to be done after esArchiver.loadIfNeeded('logstash_functional');
. But the slowlog only shows the query, not the response. So this might not help in debugging the issue.
Another thing you could try, is if we fail to find hit count, or if we do find the "no results" page, is to try to open the inspector and capture the request and response. It could show that either the query sent was wrong, or the query was right and Elasticsearch didn't return the correct response, or the correct response was returned and Discover didn't display it.
Or temporarily add debug logging to output the query and response to the Kibana log.
…-02-discover-unskip-field-visualize-functional
…nctional' of github.com:kertal/kibana into kertal-pr-2020-04-02-discover-unskip-field-visualize-functional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - I didn't pull the latest commits in this PR to run locally but the changes are in line with what we've done in other tests (after setting the timepicker, user retry loop to wait for results in Discover). And Jenkins passed.
…-02-discover-unskip-field-visualize-functional
@@ -69,7 +69,7 @@ async function asyncSearch( | |||
const path = encodeURI(request.id ? `/_async_search/${request.id}` : `/${index}/_async_search`); | |||
|
|||
// Wait up to 1s for the response to return | |||
const query = toSnakeCase({ waitForCompletionTimeout: '1s', ...queryParams }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the issue with the tests should be resolved by retrying in the tests, not increasing the initial waitForCompletionTimeout. Isn't that so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last commit was to test if an increase of the waitForCompletionTimeout solves the flakiness of the tests, it does:
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/378/
so there are 2 approaches her to solve this: increase the timeout oder retry the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukasolson @lizozom @lukasolson Question is, why the user, or in this case the test is getting the message, that there're no results matching this criteria. In this case there are, but it took longer than the waitForCompletionTimeout
, shouldn't it continue searching in this case with GET async search? If the system is for some reason slower, that's what happening her, it shouldn't feedback that there're no result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 second seems like too short of a timeout if I understand the impact correctly. Everyone loves a fast search result. But I don't think a typical user would care too much if a query took 2 or 3 seconds. I don't think they would want to be bothered with a dialog they have to click every time a query takes more than a second. I thought this mechanism was going to be around the default 30 second timeout mark or somewhere just short of that? |
@LeeDr This popup wasn't displayed after a second, it was behaving correctly. However when the popup disappeared the "No results match your search criteria" screen was displayed, and that's what also the same behavior I recognized in the tests. In the sync search, when you run into a timeout, there's an error message: Async search timeout seems to fail silently, and are therefore much harder to debug |
@lukasolson @lizozom @lukasolson I could reproduce that behavior in a cluster with a large data set and an expensive query, I think we should increase |
|
…-02-discover-unskip-field-visualize-functional
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
Summary
This PR unskips
discover_spaces
anddiscover_security
functional tests. While the implementation of these tests were fine, they were flaky, because the initial request of the given time range in Discover sometimes returned no data. Therefore no fields in the sidebar were displayed, and noVisualize
button was available.This was solved with #64155, solving an issue in async search
Here's the flaky test suite runner to prove it's no longer flaky
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/393/
fixes #60539
fixes #60535