Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasource/zones: Perform filtering on server side #708

Merged

Conversation

jacobbednarz
Copy link
Member

@jacobbednarz jacobbednarz commented Jun 12, 2020

The initial implementation of the datasource had the provider request all
the available zones and then we would perform some logic within the
provider to filter down to the match the attributes. This is slightly
inefficient for a couple of reasons:

  • We are requesting all zones (even if we only want a small subset of
    them); and
  • We are duplicating the zone filtering logic in the provider that is
    already available in the API

This change replaces that functionality by pushing the responsibility of
filtering back to the API using inbuilt functionality. It maintains a reasonable
amount of backwards compatibility outlined in the initial PR (#168) where the
"name" had to accept wildcards but under the hood accepted a full regex.
Instead, we will be making:

  • name value a string of the broad lookup term
  • match an optional refined regex which to further filter the result by
  • lookup_type whether you're wanting to perform an exact or fuzzy lookup on
    the value

Examples (also in the website documentation)

Given you have the following domains in Cloudflare.

  • example.com
  • example.net
  • not-example.com
# Look for a single domain that you know exists using an exact match.
# API request will be for zones?name=example.com. Will not match not-example.com
data "cloudflare_zones" "example" {
  filter {
    name = "example.com"
  }
}
# Look for all domains which include "example".
# API request will be for zones?name=contains:example. Will return all three 
# domains.
data "cloudflare_zones" "example" {
  filter {
    name        = "example"
    lookup_type = "contains"
  }
}
# Look for all domains which include "example" but start with "not-".
# API request will be for zones?name=contains:example. Will perform client side
# filtering using the provided regex and will only match the single domain,
# not-example.com.
data "cloudflare_zones" "example" {
  filter {
    name        = "example"
    lookup_type = "contains"
    match       = "^not-"
  }
}

A side effect of this change is that we can also update the
cloudflare_zone_sweeper to be restricted to an account ID provided.
This prevents using API keys with global access from sweeping the wrong
zones/accounts.

Fixes #707
Fixes #648

Depends on cloudflare/cloudflare-go#478

The initial implementation of the datasource had the provider request all the
available zones and then we would perform some logic within the provider to
filter down to the match the attributes. This is slightly inefficient for a
couple of reasons:

- We are requesting *all* zones (even if we only want a small subset of them); and
- We are duplicating the zone filtering logic in the provider that is
  already available in the API

This change replaces that functionality by pushing the responsibility of
filtering back to the API. It maintains a reasonable amount of backwards
compatibility outlined in the initial PR (cloudflare#168) where the "name" had to accept
wildcards but under the hood accepted a full regex. Instead, we use the
`contains:` modifier on the name search parameter which will instead fuzzy
match. The rest remains the same but is performed via the API.

Fixes cloudflare#707
Changes the `cloudflare_zone_sweeper` to be restricted to an account ID
provided. This prevents using API keys with global access from sweeping the
wrong zones/accounts.

Fixes cloudflare#648
@ghost ghost added the size/S label Jun 12, 2020
@patryk
Copy link
Contributor

patryk commented Jun 12, 2020

We advertise in documentation that the name parameter accepts regular expression. Would be great to keep that functionality (regexp detection?).

@jacobbednarz
Copy link
Member Author

The Terraform documentation will be updated to whatever we decide on (purposefully left to last to save commits changing based on the implementation) but I’m not sure where you would really need a full regex here. The use cases I’ve come across have all been fuzzy matching (domain* or domain-*.com) where maintaining that regex compatibility isn’t required. Do you know of cases where the full regex has been used?

@jacobbednarz
Copy link
Member Author

@ewilde would enjoy your thoughts here as the original implementer.

@XaF and @inge4pres for good measure as I think I recall seeing issues/comments but my memory may be failing on that.

@ewilde
Copy link
Contributor

ewilde commented Jun 12, 2020

@jacobbednarz for our usage of the data source this would be a minor breaking change, not an issue 👍 for us.

@inge4pres
Copy link
Contributor

I love this! Our use case is aimed primarily at similar TLD so having a regex would be ideal. Thanks for putting this together

@jacobbednarz
Copy link
Member Author

Our use case is aimed primarily at similar TLD so having a regex would be ideal.

So you’d prefer if we kept the regex? This PR would remove it in order to move where the filtering would be occurring. Would you be able to provide a few examples of what you’d be looking to do and I’ll see if it matches the proposals here?

@XaF
Copy link
Contributor

XaF commented Jun 13, 2020

I don't see any problem on my side.
Maybe just a suggestion though: if we want to make it more performant most of the time while keeping the same set of functionality, this could be done by moving the regex to a "match" parameter maybe? And having the "name" use the API's "contains"?

@jacobbednarz
Copy link
Member Author

Maybe just a suggestion though: if we want to make it more performant most of the time while keeping the same set of functionality, this could be done by moving the regex to a "match" parameter maybe? And having the "name" use the API's "contains"?

This might be a better way forward. Offer both and toggle the functionality based on another directive or value but opt to the simplest and most common use case. Some examples I’m thinking of:

# get a specific domain which calls the list zones endpoint with ?zone=example.com (detects a full domain being passed in) 
resource “...” “...” {
  name = “example.com”
}
# get a series of domains (example.com, example.net) which calls the list zones endpoint with ?zone=contains:example
resource “...” “...” {
  name = “example”
}
# get a series of domains (example.com, example.net) which calls the list zones endpoint with ?zone=contains:example but also filters in the provider for only the .com
resource “...” “...” {
  name = “example”
  match = “.com$”
}

What do you think?

@XaF
Copy link
Contributor

XaF commented Jun 13, 2020

What do you think?

I like the approach!
One last thing I'm wondering: isn't it dangerous to try and put logic on what data people provide (name containing a dot = exact match) instead of just offering a specific keyword?
For instance, I could want to search all domains that contain "e.com", in which case it would look for an exact match instead of matching example.com for instance, is that desirable or could it come back to bite us?
If you think it's uncommon enough, then it's fine, would require to only use the match parameter and list all zones to work around. Otherwise, we could have 3 keywords depending on the situation: match for the local regex filtering, name for exact filtering, and contains for the contains filtering? I feel like the user interface would be cleaner?
But that might be nitpicking at this point, I like what you suggested :)

@jacobbednarz
Copy link
Member Author

Potentially problematic as I don't have exact usage statistics on the values being plugged in. Perhaps a lookup_type which allows values like "contains" or "exact" to control whether or not to look up a single domain or fuzzy match it.

@jacobbednarz
Copy link
Member Author

I’ll take a pass at this and update the PR soon.

Leverage the abilities of the zone filtering on the server side more by
providing available options within Terraform to control how the API request is
made.
@ghost ghost added size/L and removed size/S labels Jun 25, 2020
@ghost ghost added the kind/documentation Categorizes issue or PR as related to documentation. label Jun 25, 2020
@XaF
Copy link
Contributor

XaF commented Jun 25, 2020

I really like the details in the documentation.
Can you explain the choice for this however?

filtering using the provided regex and will only match one domain

Feels to me that when using regex I might still want to select multiple zones, even more for something called cloudflare_zones ! I would think that if we want also to limit the number of results, it should be done using terraform directly OR with another limit parameter?

@jacobbednarz
Copy link
Member Author

Thanks for your patience here. I think I've finally got something workable. I've updated the description to keep up with the new changes so I do recommend checking that out but the summary is:

  • name no longer accepts a regex. Instead, it is a string value that is sent to the API.
  • lookup_type determines whether to exact match or fuzzy match the query to the API.
  • match allows providing a regex to filter on the provider side if you need it.

I think I've marked off all of the feedback but would appreciate you folks making another pass over this one for me

@jacobbednarz
Copy link
Member Author

Can you explain the choice for this however?

Ah, that comment maybe be slightly confusing. This specific example is tied to the scenario where you have three domains:

  • example.com
  • example.net
  • not-example.com

And you're searching for "example" via the API but you're only targetting a single domain (the one starting with not-) here, hence ending up with one domain. If you had multiples that contained "example" and started with not- you would have them all returned.

@jacobbednarz
Copy link
Member Author

I've reworded that example in 36e316e to clarify that I don't intend to limit the results, just the example will only match a single zone. Thanks @XaF

Copy link
Contributor

@XaF XaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a last pass at the code and except for that one comment below, everything looks good to me

cloudflare/data_source_zones.go Outdated Show resolved Hide resolved
@jacobbednarz jacobbednarz merged commit e74100d into cloudflare:master Jun 28, 2020
@jacobbednarz jacobbednarz deleted the filter-zone-details-via-api branch June 28, 2020 21:34
jacobbednarz added a commit to jacobbednarz/terraform-provider-cloudflare that referenced this pull request Jul 9, 2020
Since we updated the `cloudflare_zone` datasource cloudflare#708 to perform
different types of matching, this test has been failing due to not
having the ability to list all zones. This isn't really a problem but it
did identify we could be lazier with the resource definition and pass in
a static zone ID instead of relying on an extra zone listing call to
fetch all the information.
@alloveras
Copy link

alloveras commented Jan 16, 2021

Hey @jacobbednarz,
Before I say something, a full disclaimer that it is very possible that I am wrong given that this PR was merged a while back.

With that said, we recently started consuming the provider at version >= v2.9.0 which is when this change was released. After the version bump, we noticed that the list of zones being returned by the cloudflare_zones data resource was smaller than before without us having changed any of the filterings. Initially, we thought that it could be due to the new behavior described in this PR, as far as I can tell, we shouldn't be affected by the changes. See our data-source configuration below:

data "cloudflare_zones" "active" {
  filter {
    status = "active"
    paused = false
  }
}

After a bit of investigation, we managed to narrow it down to the replacement of ListZones with ListZonesContext. If we are not mistaken, the former function implements API pagination whereas the latter one does not.

For us, this is causing the data resource to only return a subset of all zones matching the filtering criteria. In fact, we just get the first 20 zones which aligns perfectly with the default page size for the ListZones API call.

If you don't mind, could you please take a look to confirm whether or not this is a regression?

Thank you 😄 !

@jacobbednarz
Copy link
Member Author

It was unfortunately but sit tight! cloudflare/cloudflare-go#534 has the fix but I've dropped the ball on following up on the PR comments. This fix will be in the next release of cloudflare-go (with or without that comment fix)

@alloveras
Copy link

Thanks for the clarification. We weren't sure whether this was already a know regression or not. It is good to know that it is and that there is work in the pipeline to fix it 🚀 !!

We will rollback and pause the version bump until the changes happen upstream then 👍

boekkooi-lengoo pushed a commit to boekkooi-lengoo/terraform-provider-cloudflare that referenced this pull request Feb 28, 2022
…rsion

ruleset: add version to action parameters struct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move to server side filtering Restrict test sweepers to account ID provided
6 participants