-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openshift-installer-presubmits: Use TEST_SKIP for e2e-aws #1341
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Prow does not like my folded string :p. How much do we care about readability? Enough to patch ordered-prow-config? CC @petr-muller, @stevekuznetsov. |
Let me think about this and we can sync Tuesday. You could try this on all
first, then move to e2e-AWS and reset all.
Also, be aware we’re growing the suite by 140% in a few hours, so you may
want to regen after that lands.
On Aug 31, 2018, at 4:59 PM, W. Trevor King <notifications@github.com> wrote:
Improve our coverage by blacklisting known failures. I generated this
string from an e2e-aws-all run today
<openshift/installer#170 (comment)>
against the master codebase with this helper script in my
~/bin/junit-blacklist.py:
import osimport reimport sysfrom xml.etree.ElementTree import ElementTree
def get_failures(stream=sys.stdin):
tree = ElementTree()
tree.parse(stream)
for test in tree.findall('testcase'):
if test.find('skipped') is not None:
continue
elif test.find('failure') is not None:
print('failure: {}'.format(test.attrib['name']))
yield test.attrib['name']
else:
print('success: {}'.format(test.attrib['name']))
def get_all_failures(root=os.curdir):
for path, dirnames, filenames in os.walk(root):
for filename in filenames:
if filename.endswith('.xml'):
yield from get_failures(stream=open(os.path.join(path,
filename)))
def process_failure(failure):
chars = []
depth = 0
for char in failure:
if char == '[':
depth += 1
continue
if char == ']':
depth -= 1
continue
if depth == 0:
chars.append(char)
processed = re.escape(''.join(chars).strip())
processed = processed.replace(r'\ ', ' ').replace(r'\-',
'-').replace(r'\/', '/').replace(r'\=', '=') # Work around
https://bugs.python.org/issue29995, fixed in Python 3.7
return processed
if __name__ == '__main__':
failures = sorted(set(process_failure(failure=failure) for failure
in get_all_failures()))
if len(failures) == 1:
print(failures[0])
elif failures:
print('>')
print('.*(({}'.format(failures[0]))
for failure in failures[1:]:
print('*)|({}'.format(failure))
print('*)).*')
and this invocation:
$ wget -r -e robots=off -np -H
https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/170/pull-ci-origin-installer-e2e-aws-all/3/artifacts/e2e-aws-all/junit/
$ cd storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/170/pull-ci-origin-installer-e2e-aws-all/3/artifacts/e2e-aws-all/junit
$ junit-blacklist.py
The regexp is because Ginko's -skip takes a regexp
<https://github.com/onsi/ginkgo/blame/v1.6.0/README.md#L30>. Presumably
that's a Go regexp <https://golang.org/pkg/regexp/#pkg-overview>, which,
like Python regexps
<https://docs.python.org/3/library/re.html#regular-expression-syntax>, are
similar to POSIX's Extended Regular Expressions (EREs)
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04>.
From the JUnit XML, the script collects names for all failed tests and
assembles a regexp to ignore all of them. My target regexp pattern is:
.*((test-1-name)|(test-2-name)|(test-3-name)).*
but that would get a bit hard to read. So instead, I inject optional
whitespace after each name:
.*((test-1-name *)|(test-2-name *)|(test-3-name *)).*
and then use YAML's folded style
<http://yaml.org/spec/1.2/spec.html#style/block/folded> to get:
.*((test-1-name*)|(test-2-name*)|(test-3-name*)).*
As we improve the installer, we can gradually remove entries from this
blacklist by cutting a single line from the folded regexp.
CC @sallyom <https://github.com/sallyom>, @smarterclayton
<https://github.com/smarterclayton>. It would be good if there was a way to
test this before going live. Maybe I should put it in a new
e2e-aws-blacklist? Or we can leave it in e2e-aws and revert if this breaks
something ;).
------------------------------
You can view, comment on, or merge this pull request online at:
#1341
Commit Summary
- openshift-installer-presubmits: Use TEST_SKIP for e2e-aws
File Changes
- *M*
ci-operator/jobs/openshift/installer/openshift-installer-presubmits.yaml
<https://github.com/openshift/release/pull/1341/files#diff-0> (51)
Patch Links:
- https://github.com/openshift/release/pull/1341.patch
- https://github.com/openshift/release/pull/1341.diff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1341>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p4M5HQgwLvyduJvpGSzBSNfpfB1Tks5uWaPFgaJpZM4WV2UO>
.
|
d9bb362
to
7991c1a
Compare
That sounds like a low-stress approach. I've switched over to it with d9bb362 -> 7991c1a.
I'm fine either way. With a blacklist approach, this is always going to be an issue. Any thoughts on folding or not? |
@@ -108,10 +108,54 @@ presubmits: | |||
- --template=/usr/local/e2e-aws-all | |||
- --target=e2e-aws-all | |||
env: | |||
- name: TEST_FOCUS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want focus + skip? If you don't focus conformance you get a LOT of tests as your base to skip from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want focus + skip? If you don't focus conformance you get a LOT of tests as your base to skip from
Ah, that makes sense. Restored with 7991c1a -> d486714.
7991c1a
to
d486714
Compare
Re folding that’s Steve’s call. |
Readability probably isn’t a huge issue since this is misty going to be you and me and not last long. |
d486714
to
2482470
Compare
2482470
to
03f6041
Compare
To help get this landed, I've pushed d486714 -> 03f6041 dropping the folding. I can re-fold later (and update the line-ordering test to allow it) if @stevekuznetsov wants to go that way. |
With the value: |
whatever formatting instead of value: -|
whatever Is there technically a newline in the front of the value? Does it matter? Otherwise this LGTM |
03f6041
to
75e85bc
Compare
Maybe? I've just pushed 03f6041 -> 75e85bc to use |
Will LGTM once tests pass |
75e85bc
to
2cbec90
Compare
2cbec90
to
026fc2f
Compare
And ordered-prow-config didn't like that either, so I've just pushed 8706b44 with their exact wrapping. All green, @stevekuznetsov. |
Actually, give me a minute. Looks like I can trim this further based on this recent run. |
8706b44
to
eb7cef2
Compare
Ok, I've updated based on run 4. Changes vs. run 3 were (in the wrapped form): >-
-.*((DNS should answer endpoint and wildcard queries for the cluster
-*)|(HostPath should support subPath
-*)|(Image hard prune should delete orphaned blobs
+.*((Image hard prune should delete orphaned blobs
*)|(Image hard prune should show orphaned blob deletions in dry-run mode
*)|(Image limit range should deny a docker image reference exceeding limit on openshift\.io/image-tags resource
*)|(Image prune of schema 1 should prune old image
@@ -236,38 +19,4 @@
*)|(Image prune with --prune-registry==false should prune old image but skip registry
*)|(Image prune with default --all flag should prune both internally managed and external images
*)|(Image resource quota should deny a push of built image exceeding openshift\.io/imagestreams quota
-*)|(Prometheus when installed to the cluster should start and expose a secured proxy and unsecured metrics
-*)|(Prometheus when installed to the cluster should start and expose a secured proxy and verify build metrics
-*)|(The HAProxy router should expose prometheus metrics for a route
-*)|(The HAProxy router should expose the profiling endpoints
-*)|(The HAProxy router should respond with 503 to unrecognized hosts
-*)|(The HAProxy router should serve routes that were created from an ingress
-*)|(The HAProxy router should set Forwarded headers appropriately
-*)|(deploymentconfigs with multiple image change triggers should run a successful deployment with a trigger used by different containers
-*)|(deploymentconfigs with multiple image change triggers should run a successful deployment with multiple triggers
-*)|(forcePull should affect pulling builder images ForcePull test case execution custom
-*)|(forcePull should affect pulling builder images ForcePull test case execution docker
-*)|(forcePull should affect pulling builder images ForcePull test case execution s2i
-*)|(imagechangetriggers imagechangetriggers should trigger builds of all types
-*)|(jenkins autoprovision Pipeline build config should autoprovision jenkins
-*)|(oc new-app should fail with a --name longer than 58 characters
-*)|(oc new-app should succeed with a --name of 58 characters
-*)|(openshift mongodb image creating from a template should instantiate the template
-*)|(process valueFrom in build strategy environment variables should fail resolving unresolvable valueFrom in docker build environment variable references
-*)|(process valueFrom in build strategy environment variables should fail resolving unresolvable valueFrom in sti build environment variable references
-*)|(process valueFrom in build strategy environment variables should successfully resolve valueFrom in docker build environment variables
-*)|(process valueFrom in build strategy environment variables should successfully resolve valueFrom in s2i build environment variables
-*)|(prune builds based on settings in the buildconfig buildconfigs should have a default history limit set when created via the group api
-*)|(prune builds based on settings in the buildconfig buildconfigs should not have a default history limit set when created via the legacy api
-*)|(prune builds based on settings in the buildconfig should prune builds after a buildConfig change
-*)|(prune builds based on settings in the buildconfig should prune canceled builds based on the failedBuildsHistoryLimit setting
-*)|(prune builds based on settings in the buildconfig should prune completed builds based on the successfulBuildsHistoryLimit setting
-*)|(prune builds based on settings in the buildconfig should prune errored builds based on the failedBuildsHistoryLimit setting
-*)|(prune builds based on settings in the buildconfig should prune failed builds based on the failedBuildsHistoryLimit setting
-*)|(templateinstance impersonation tests should pass impersonation creation tests
-*)|(templateinstance impersonation tests should pass impersonation update tests
-*)|(templateinstance security tests should pass security tests
-*)|(templateservicebroker bind test should pass bind tests
-*)|(templateservicebroker end-to-end test should pass an end-to-end test
-*)|(templateservicebroker security test should pass security tests
*)).* @smarterclayton, did something happen to the parallel tests? Run 4 only seems to have results from the serial tests. |
eb7cef2
to
192e17f
Compare
That’s weird... looks like something is broken, possibly in the job. Your serial suite is incorrect too, it should be openshift/conformance/serial, not registry serial. |
I've filed #1351 for this. |
From the job 4 build log:
Maybe it's bumping into a resource leak? |
I want to improve our e2e-aws coverage by blacklisting known failures. I generated this string from an e2e-aws-all run today against the master codebase [1,2] with this helper script: $ cat ~/bin/junit-blacklist.py #!/usr/bin/env python3 import os import re import sys from xml.etree.ElementTree import ElementTree def get_failures(stream=sys.stdin): tree = ElementTree() tree.parse(stream) for test in tree.findall('testcase'): if test.find('skipped') is not None: continue elif test.find('failure') is not None: print('failure: {}'.format(test.attrib['name'])) yield test.attrib['name'] else: print('success: {}'.format(test.attrib['name'])) def get_all_failures(root=os.curdir): for path, dirnames, filenames in os.walk(root): for filename in filenames: if filename.endswith('.xml'): yield from get_failures(stream=open(os.path.join(path, filename))) def process_failure(failure): chars = [] depth = 0 for char in failure: if char == '[': depth += 1 continue if char == ']': depth -= 1 continue if depth == 0: chars.append(char) processed = re.escape(''.join(chars).strip()) processed = processed.replace(r'\ ', ' ').replace(r'\-', '-').replace(r'\/', '/').replace(r'\=', '=') # Work around https://bugs.python.org/issue29995, fixed in Python 3.7 return processed if __name__ == '__main__': failures = sorted(set(process_failure(failure=failure) for failure in get_all_failures())) if len(failures) == 1: print(failures[0]) elif failures: print('>-') print('.*(({}'.format(failures[0])) for failure in failures[1:]: print('*)|({}'.format(failure)) print('*)).*') and this invocation: $ wget -r -e robots=off -np -H https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/170/pull-ci-origin-installer-e2e-aws-all/4/artifacts/e2e-aws-all/junit/ $ cd storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/170/pull-ci-origin-installer-e2e-aws-all/3/artifacts/e2e-aws-all/junit $ junit-blacklist.py The regexp is because Ginko's -skip takes a regexp [3]. Presumably that's a Go regexp [4], which, like Python regexps [5], are similar to POSIX's Extended Regular Expressions (EREs) [6]. From the JUnit XML, the script collects names for all failed tests and assembles a regexp to ignore all of them. My target regexp pattern is: .*(({test-1-name})|({test-2-name})|({test-3-name})).* but that would get a bit hard to read. So instead, I tried injecting optional whitespace after each name: .*(({test-1-name} *)|({test-2-name} *)|({test-3-name} *)).* and then using YAML's folded style [7] with the stripping indicator [8] to get: >- .*(({test-1-name} *)|({test-2-name} *)|({test-3-name} *)).* As we improve the installer, we'd gradually remove entries from this blacklist by cutting a single line from the folded regexp. The ordered-prow-config test didn't like my line wrapping, though [9]: --- /tmp/tmp.hlKAjJ4Mf4/jobs/openshift/installer/openshift-installer-presubmits.yaml 2018-08-31 23:26:44.138011998 +0000 +++ hack/../ci-operator/jobs/openshift/installer/openshift-installer-presubmits.yaml 2018-08-31 23:26:44.495040384 +0000 @@ -113,53 +111,8 @@ presubmits: - name: TEST_FOCUS_SERIAL value: Suite:openshift/registry/serial - name: TEST_SKIP - value: > - .*((DNS should answer endpoint and wildcard queries for the cluster - ... - *)).* + value: | + .*((DNS should answer endpoint and wildcard queries for the cluster...*)).* and it didn't like my attempt to use one long line (with stripping [8]) [10]: --- /tmp/tmp.kwtUW4jpNh/jobs/openshift/installer/openshift-installer-presubmits.yaml 2018-09-04 17:59:44.796949829 +0000 +++ hack/../ci-operator/jobs/openshift/installer/openshift-installer-presubmits.yaml 2018-09-04 17:59:44.910959930 +0000 @@ -113,8 +111,60 @@ presubmits: - name: TEST_FOCUS_SERIAL value: Suite:openshift/registry/serial - name: TEST_SKIP - value: |- - .*((DNS should answer endpoint and wildcard queries for the cluster)|...).* + value: .*((DNS should answer endpoint and wildcard queries for the cluster)|(HostPath + should support subPath)|(Image hard prune should delete orphaned blobs)|(Image + ... + pass security tests)).* so with this commit I've just used the form with the test-required wrapping (without the optional whitespace hack). This change is fairly dramatic, so I'm temporarily hijacking e2e-aws-all to exercise it. If it works well there, I'll move this TEST_SKIP into e2e-aws and restore e2e-aws-all to its previous setup. Keeping the FOCUS variables keeps the tests from expanding into untested waters [11]. If we want to drop filters later, we'd need a new run to figure out what would need blacklisting in the expanded suite. [1]: openshift/installer#170 (comment) [2]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/170/pull-ci-origin-installer-e2e-aws-all/3/ [3]: https://github.com/onsi/ginkgo/blame/v1.6.0/README.md#L30 [4]: https://golang.org/pkg/regexp/#pkg-overview [5]: https://docs.python.org/3/library/re.html#regular-expression-syntax [6]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 [7]: http://yaml.org/spec/1.2/spec.html#style/block/folded [8]: http://yaml.org/spec/1.2/spec.html#id2794534 [9]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/1341/pull-ci-openshift-release-master-ordered-prow-config/111/build-log.txt [10]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/1341/pull-ci-openshift-release-master-ordered-prow-config/137/build-log.txt [11]: openshift#1341 (comment)
192e17f
to
546bfdf
Compare
I've rebased around #1351 with 192e17f -> 546bfdf. I'll kick off another round in openshift/installer#170 with the new focus, and then update the blacklist here once that returns. |
Yeah, it looks like ginkgo failed on the first run (the tests got run and logged, just died at the end). That can be a problem with the nodes (like we were having before with firewall rules) or it can be an indication that a test totally and completely failed and/or hung, leading to the timeout of the run. I think we might be missing a ginkgo setting to print all communications (even when hung) before exiting in these cases - Kube has occasionally printed this stuff. I'll have to look at that later. |
Same for job 5, so probably not a flake.
The Ginkgo docs suggest |
@wking: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wking: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@smarterclayton has taken over growing the tests in openshift/origin. /close |
@wking: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Improve our coverage by blacklisting known failures. I generated this string from an e2e-aws-all run today against the master codebase with this helper script in my
~/bin/junit-blacklist.py
:and this invocation:
The regexp is because Ginko's
-skip
takes a regexp. Presumably that's a Go regexp, which, like Python regexps, are similar to POSIX's Extended Regular Expressions (EREs). From the JUnit XML, the script collects names for all failed tests and assembles a regexp to ignore all of them. My target regexp pattern is:but that would get a bit hard to read. So instead, I inject optional whitespace after each name:
and then use YAML's folded style to get:
As we improve the installer, we can gradually remove entries from this blacklist by cutting a single line from the folded regexp.
CC @sallyom, @smarterclayton. It would be good if there was a way to test this before going live. Maybe I should put it in a new
e2e-aws-blacklist
? Or we can leave it ine2e-aws
and revert if this breaks something ;).