Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/pkg/awstagdeprovision: Allow for OR filters #47

Merged
merged 1 commit into from
Oct 24, 2018

Conversation

wking
Copy link
Member

@wking wking commented Oct 18, 2018

Builds on #46; review that first.

AWS docs for filters are a bit sparse, but the AWS docs have:

Combining search filters

In general, multiple filters with the same key field (for example, tag:Name, search, Instance State) are automatically joined with OR. This is intentional, as the vast majority of filters would not be logical if they were joined with AND. For example, you would get zero results for a search on Instance State=running AND Instance State=stopped. In many cases, you can granulate the results by using complementary search terms on different key fields, where the AND rule is automatically applied instead. If you search for tag: Name:=All values and tag:Instance State=running, you get search results that contain both those criteria. To fine-tune your results, simply remove one filter in the string until the results fit your requirements.

Unfortunately, there seems to be no way to represent OR filters in ec2's Filter structure. With the approach I have here, resources that match multiple filters will be fetched multiple times, and may have parallel, racing delete attempts. But we should be robust in the face of racing delete attempt, and hopefully one of the deletes will go through before too long ;).

It would be possible to adjust our locally-filtered types (which use filterObjects) to avoid this issue for those types. I may do that in follow-up work, but for now I'm treating all of our types the same way.

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Oct 19, 2018

553c02a 8de157d merged

6e7bacc doesn't look like necessary and can be merged separate to the PR, so just drop it maybe?

7bdaa6a /lgtm

But i'm confused.. why force only one use the first filter? 7bdaa6a#diff-6df0d52da59598889c66f03dd6a7bc67R35 has aws-tag-deprovision KEY=VALUE ... ... means more than one pair might be accepted ?

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 19, 2018
@wking
Copy link
Member Author

wking commented Oct 19, 2018

I've pushed 7bdaa6a -> f59dd98, rebasing onto master so GitHub only shows the new-to-this-PR commit in the diff.

But i'm confused.. why force only one use the first filter? 7bdaa6a#diff-6df0d52da59598889c66f03dd6a7bc67R35 has aws-tag-deprovision KEY=VALUE ... ... means more than one pair might be accepted ?

The hiveutil command has accepted multiple filters since it landed, so nothing new there (although this was only documented in #46). This PR is adding OR support to the library, but not exposing that support in hiveutil. You can still list multiple KEY=VALUE, and they'll all get fed into Filters[0] (and be AND-ed together by the library).

@joelddiaz
Copy link
Contributor

@wking i'm seeing a panic when i tried to run this

[jdiaz@goomba hive (wking-or-tag-deletion %)]$ ./bin/hiveutil aws-tag-deprovision --cluster-name fakecluster --loglevel=debug tectonicClusterID=fakeuuid kubernetes.io/cluster/fakecluster=shared kubernetes.io/cluster/fakecluster=owned
panic: assignment to entry in nil map

goroutine 1 [running]:
main.parseFilter(0x0, 0x7ffde7f5a96b, 0x1a, 0x0, 0x16c5b77)
        /home/jdiaz/projects/hive/src/github.com/openshift/hive/contrib/cmd/hiveutil/awstagdeprovision.go:88 +0xa4

@wking
Copy link
Member Author

wking commented Oct 19, 2018

panic: assignment to entry in nil map

Oops. Fixed in f59dd98 -> 971d238.

@abhinavdahiya
Copy link
Contributor

/approve

Copy link
Contributor

@joelddiaz joelddiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow passing of multiple key=value tags that we can OR against.

@wking
Copy link
Member Author

wking commented Oct 22, 2018

We should allow passing of multiple key=value tags that we can OR against.

Currently you can pass multiple to AND. Did you have a command-line syntax in mind for OR?

@joelddiaz
Copy link
Contributor

I was thinking that each command line argument is what we OR against. No fancy combinations of ORs with ANDs...you just get OR. Do we have a need for more complex matching than ORing against a list of key=value pairs at this time?

@wking
Copy link
Member Author

wking commented Oct 22, 2018

I was thinking that each command line argument is what we OR against.

hiveutil has supported multiple KEY=VALUE pairs ANDed together since it landed (notes here). You're suggesting I change that to use OR instead?

Do we have a need for more complex matching than ORing against a list of key=value pairs at this time?

The installer doesn't, but the installer doesn't use hiveutil ;). Folks who want to destroy clusters should be using openshift-install destroy cluster. Do you know who uses hiveutil and what their use-cases are?

@joelddiaz
Copy link
Contributor

Yes, I'm suggesting to switch from ANDing the list of key=value pairs to ORing.

@dgoodwin what do you think about changing the default away from ANDing the list to ORing (and taking away the ability to do any kind of ANDing until we find a use-case for it)?

@dgoodwin
Copy link
Contributor

I suspect pretty much everyone using it would be specifying single tags, but it's a little hard to know for sure and a tiny bit risky. It's probably fine to do so but we might want to communicate it a little bit.

Would a --or option seem ok?

@joelddiaz
Copy link
Contributor

Are you suggesting AND-by-default, and --or will switch to 'OR all of thes key=value pairs passed as parameters'?

@dgoodwin
Copy link
Contributor

Yes, but I am up for either, mild risk in changing behavior from AND to OR on folks who might be using it, but I don't think there are a lot.

@joelddiaz
Copy link
Contributor

Being that the number of people using this at this time is still probably quite low, I'd argue that we should move to OR-by-default since it solves our usecase more cleanly (we can put the OR-by-default loud and clear in the '--help' output). I think we make the change now and if needed it should be easier to add AND after the fact than retrofit OR into a previously AND-by-default world.

@joelddiaz
Copy link
Contributor

@wking i'm more than willing to make the proposed changes if you don't want to. you want to incorporate the OR-by-default changes?

wking added a commit to wking/hive that referenced this pull request Oct 23, 2018
AWS docs for filters are a bit sparse, but [1] has:

  Combining search filters

  In general, multiple filters with the same key field (for example,
  tag:Name, search, Instance State) are automatically joined with OR.
  This is intentional, as the vast majority of filters would not be
  logical if they were joined with AND.  For example, you would get
  zero results for a search on Instance State=running AND Instance
  State=stopped.  In many cases, you can granulate the results by
  using complementary search terms on different key fields, where the
  AND rule is automatically applied instead.  If you search for tag:
  Name:=All values and tag:Instance State=running, you get search
  results that contain both those criteria.  To fine-tune your
  results, simply remove one filter in the string until the results
  fit your requirements.

Unfortunately, there seems to be no way to represent OR filters in
ec2's Filter structure [2].  With the approach I have here, resources
that match multiple filters will be fetched multiple times, and may
have parallel, racing delete attempts.  But we should be robust in the
face of racing delete attempt, and hopefully one of the deletes will
go through before too long ;).

It would be possible to adjust our locally-filtered types (which use
filterObjects) to avoid this issue for those types.  I may do that in
follow-up work, but for now I'm treating all of our types the same
way.

I've added the filter information to some of the logs to help
distinguish between multiple goroutines handling different filters for
the same resource.

I've also switched hiveutil over to use OR for multiple arguments (it
used to use AND) at Joel's suggestion [3].

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Filtering.html#advancedsearch
[2]: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#Filter
[3]: openshift#47 (comment)
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 23, 2018
@wking
Copy link
Member Author

wking commented Oct 23, 2018

I've pushed 971d238 -> 7522689, adding filter qualifiers to some debug logging and converting hiveutil to the OR semantics.

Copy link
Contributor

@joelddiaz joelddiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Just one change to raise the number of goroutines to wait on.

AWS docs for filters are a bit sparse, but [1] has:

  Combining search filters

  In general, multiple filters with the same key field (for example,
  tag:Name, search, Instance State) are automatically joined with OR.
  This is intentional, as the vast majority of filters would not be
  logical if they were joined with AND.  For example, you would get
  zero results for a search on Instance State=running AND Instance
  State=stopped.  In many cases, you can granulate the results by
  using complementary search terms on different key fields, where the
  AND rule is automatically applied instead.  If you search for tag:
  Name:=All values and tag:Instance State=running, you get search
  results that contain both those criteria.  To fine-tune your
  results, simply remove one filter in the string until the results
  fit your requirements.

Unfortunately, there seems to be no way to represent OR filters in
ec2's Filter structure [2].  With the approach I have here, resources
that match multiple filters will be fetched multiple times, and may
have parallel, racing delete attempts.  But we should be robust in the
face of racing delete attempt, and hopefully one of the deletes will
go through before too long ;).

It would be possible to adjust our locally-filtered types (which use
filterObjects) to avoid this issue for those types.  I may do that in
follow-up work, but for now I'm treating all of our types the same
way.

I've added the filter information to some of the logs to help
distinguish between multiple goroutines handling different filters for
the same resource.

I've also switched hiveutil over to use OR for multiple arguments (it
used to use AND) at Joel's suggestion [3].

[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Filtering.html#advancedsearch
[2]: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#Filter
[3]: openshift#47 (comment)
@joelddiaz
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2018
@openshift-merge-robot openshift-merge-robot merged commit 0089a8d into openshift:master Oct 24, 2018
@wking wking deleted the or-tag-deletion branch October 24, 2018 23:37
wking added a commit to wking/openshift-installer that referenced this pull request Oct 25, 2018
Generated with:

  $ rm -rf "$(go env GOPATH)/pkg/dep/sources"
  # to avoid errors like "Unable to update checked out version: fatal: reference is not a tree"
  # possibly [1]
  $ dep ensure -update

using:

  $ dep version
  dep:
   version     : v0.5.0
   build date  :
   git hash    : 22125cf
   go version  : go1.10.3
   go compiler : gc
   platform    : linux/amd64
   features    : ImportDuringSolve=false

I haven't reviewed all of these changes, but I want this to pull in
openshift/hive@b1cad987 (contrib/pkg/awstagdeprovision: Allow for OR
filters, 2018-10-18, openshift/hive#47).

[1]: golang/dep#928 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Oct 25, 2018
Generated with:

  $ rm -rf "$(go env GOPATH)/pkg/dep/sources"
  # to avoid errors like "Unable to update checked out version: fatal: reference is not a tree"
  # possibly [1]
  $ dep ensure -update

using:

  $ dep version
  dep:
   version     : v0.5.0
   build date  :
   git hash    : 22125cf
   go version  : go1.10.3
   go compiler : gc
   platform    : linux/amd64
   features    : ImportDuringSolve=false

I haven't reviewed all of these changes, but I want this to pull in
openshift/hive@b1cad987 (contrib/pkg/awstagdeprovision: Allow for OR
filters, 2018-10-18, openshift/hive#47).

[1]: golang/dep#928 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants