-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic #1133
base: master
Are you sure you want to change the base?
OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic #1133
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
b387d07
to
b07d602
Compare
/jira refresh |
@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
I missed one detail of the bug, missing instructions in the |
a057f1a
to
0716651
Compare
@lihongan Added the missing scope change instructions for the PowerVS type. /unhold |
infra failures |
0716651
to
f51b1d3
Compare
Infra issues |
pre-merge tested on OpenStack and looks good now
will test on IBMCloud later |
pre-merge tested on IBMCloud and also looks good
|
/assign |
Progressing
Condition
f51b1d3
to
7a97b9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for the fix!
/retest |
case iov1.CNAMERecordType: | ||
inputRData, err := p.dnsService.NewResourceRecordInputRdataRdataCnameRecord(target) | ||
|
||
// TODO DNS record update should handle the case where we have an A record and want a CNAME record or vice versa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is a carried-over TODO, but should it be done now in order to fix the bug? Or is it out of scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's out of scope for this bug since it can be fixed independently.
This TODO is for when someone changes the DNSRecord
type from CNAME to A, or visa versa, that it won't update the dns record type. That's not a big deal for OpenShift actually, because IBMCloud only creates CNAME records, but I can see it as a low priority "completeness" TODO.
} | ||
} | ||
if result == nil || result.Result == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: ListAllDnsRecords returned nil as result") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for the very first record created? We can't find any to update, but we need to go on to create it instead of returning an error, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good question. I found that result
and result.Result
are both initialized when creating the DNS record for the first time. The results gets filtered for the record.Spec.DNSName
, so whether it's the very first DNS record in the hosted zone isn't a factor.
I agree it's a bit odd to check for result.Result
, but the original author wrote this logic, I haven't update it as works as intended so far, and it's possible there is a subtle reason why it's here.
createDnsRecordInputOutput: dnsclient.CreateDnsRecordInputOutput{ | ||
InputId: "testCreate", | ||
OutputError: nil, | ||
OutputResponse: &core.DetailedResponse{StatusCode: http.StatusOK}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to find a way to check that the list was updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, there is no state to check.
OutputError: errors.New("error in CreateDnsRecord"), | ||
OutputResponse: &core.DetailedResponse{StatusCode: http.StatusRequestTimeout}, | ||
}, | ||
expectErrorContains: "error in CreateDnsRecord", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really seem to be testing any error input if we're just matching the supplied error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I view OutputError
as mocking a "fake" Upstream DNS Server as if it was providing an error. This is confirming that we are getting the error back that we told the "fake" Upstream DNS Server to generate, just like what would happen if there was a real DNS Server error.
We have other test cases that test error messages produced by our own provider code, but this error checking is still valuable because it proves the functions are returning the error we provided when the fake response is StatusRequestTimeout
.
func (fdc FakeDnsClient) ListResourceRecords(listResourceRecordsOptions *dnssvcsv1.ListResourceRecordsOptions) (result *dnssvcsv1.ListResourceRecords, response *core.DetailedResponse, err error) { | ||
fakeListDnsrecordsResp := &dnssvcsv1.ListResourceRecords{} | ||
recordType := string(iov1.ARecordType) | ||
rData := map[string]interface{}{"ip": fdc.ListAllDnsRecordsInputOutput.RecordTarget} | ||
|
||
fakeListDnsrecordsResp.ResourceRecords = append(fakeListDnsrecordsResp.ResourceRecords, dnssvcsv1.ResourceRecord{ID: &fdc.ListAllDnsRecordsInputOutput.RecordName, Name: &fdc.ListAllDnsRecordsInputOutput.RecordName, Type: &recordType, Rdata: rData}) | ||
|
||
resp := &core.DetailedResponse{ | ||
StatusCode: fdc.ListAllDnsRecordsInputOutput.OutputStatusCode, | ||
Headers: map[string][]string{}, | ||
Result: result, | ||
RawResult: []byte{}, | ||
} | ||
|
||
return fakeListDnsrecordsResp, resp, fdc.ListAllDnsRecordsInputOutput.OutputError | ||
return fdc.ListAllDnsRecordsInputOutput.OutputResult, fdc.ListAllDnsRecordsInputOutput.OutputResponse, fdc.ListAllDnsRecordsInputOutput.OutputError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listResourceRecordsOptions
is unused. Also, can you add a comment about what is actually being returned here? We haven't changed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, listResourceRecordsOptions
was always unused. The options would only really be helpful if we were able to test the IBM Cloud DNS Provider, as it's an input struct for their API calls.
We build a ListAllDnsRecordsInputOutput
object as an input to each of the unit test cases, and it has the OutputResult
, OutputResponse
, and OutputError
. In this function, we just return them so we can continue testing our functions (delete
and createOrUpdateDNSRecord
)
I'll add a comment.
"k8s.io/utils/pointer" | ||
|
||
"github.com/IBM/go-sdk-core/v5/core" | ||
"github.com/IBM/networking-go-sdk/dnssvcsv1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for imports not in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review this further after learning more about my questions in cis_provider_test.go
d93c10b
to
e486a51
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@candita thanks for the detailed review. Should be ready for your review again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (fdc FakeDnsClient) ListResourceRecords(listResourceRecordsOptions *dnssvcsv1.ListResourceRecordsOptions) (result *dnssvcsv1.ListResourceRecords, response *core.DetailedResponse, err error) { | ||
fakeListDnsrecordsResp := &dnssvcsv1.ListResourceRecords{} | ||
recordType := string(iov1.ARecordType) | ||
rData := map[string]interface{}{"ip": fdc.ListAllDnsRecordsInputOutput.RecordTarget} | ||
|
||
fakeListDnsrecordsResp.ResourceRecords = append(fakeListDnsrecordsResp.ResourceRecords, dnssvcsv1.ResourceRecord{ID: &fdc.ListAllDnsRecordsInputOutput.RecordName, Name: &fdc.ListAllDnsRecordsInputOutput.RecordName, Type: &recordType, Rdata: rData}) | ||
|
||
resp := &core.DetailedResponse{ | ||
StatusCode: fdc.ListAllDnsRecordsInputOutput.OutputStatusCode, | ||
Headers: map[string][]string{}, | ||
Result: result, | ||
RawResult: []byte{}, | ||
} | ||
|
||
return fakeListDnsrecordsResp, resp, fdc.ListAllDnsRecordsInputOutput.OutputError | ||
return fdc.ListAllDnsRecordsInputOutput.OutputResult, fdc.ListAllDnsRecordsInputOutput.OutputResponse, fdc.ListAllDnsRecordsInputOutput.OutputError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, listResourceRecordsOptions
was always unused. The options would only really be helpful if we were able to test the IBM Cloud DNS Provider, as it's an input struct for their API calls.
We build a ListAllDnsRecordsInputOutput
object as an input to each of the unit test cases, and it has the OutputResult
, OutputResponse
, and OutputError
. In this function, we just return them so we can continue testing our functions (delete
and createOrUpdateDNSRecord
)
I'll add a comment.
func (FakeDnsClient) CreateResourceRecord(createResourceRecordOptions *dnssvcsv1.CreateResourceRecordOptions) (result *dnssvcsv1.ResourceRecord, response *core.DetailedResponse, err error) { | ||
return nil, nil, nil | ||
func (fdc FakeDnsClient) CreateResourceRecord(createResourceRecordOptions *dnssvcsv1.CreateResourceRecordOptions) (result *dnssvcsv1.ResourceRecord, response *core.DetailedResponse, err error) { | ||
if fdc.CreateDnsRecordInputOutput.InputId != *createResourceRecordOptions.Name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this mainly to be consistent with the existing patterns, but it ensures that the
incoming createResourceRecordOptions.Name
parameter is getting configured correctly. InputID
is equivalent to the name, it's just named generically to keep all update/create/delete structs similar.
I'll add a comment.
dnsclient "github.com/openshift/cluster-ingress-operator/pkg/dns/ibm/public/client" | ||
|
||
"k8s.io/utils/pointer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that, but what about https://github.com/go-imports-organizer/goio/blob/main/examples/openshift_kubernetes.yaml? Do we have an official style guide of how we are supposed to order go imports?
for _, resourceRecord := range listResult.ResourceRecords { | ||
if *resourceRecord.Name == dnsName { | ||
if resourceRecord.ID == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: record id is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added because if the resourceRecord.ID
is nil, you'll get a nil pointer dereference later when going to delete it, so no value in continuing. I added it because I saw cis_provider.go had it, but this provider didn't have it:
return fmt.Errorf("delete: record id is nil") |
case iov1.CNAMERecordType: | ||
inputRData, err := p.dnsService.NewResourceRecordInputRdataRdataCnameRecord(target) | ||
|
||
// TODO DNS record update should handle the case where we have an A record and want a CNAME record or vice versa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's out of scope for this bug since it can be fixed independently.
This TODO is for when someone changes the DNSRecord
type from CNAME to A, or visa versa, that it won't update the dns record type. That's not a big deal for OpenShift actually, because IBMCloud only creates CNAME records, but I can see it as a low priority "completeness" TODO.
} | ||
} | ||
if result == nil || result.Result == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: ListAllDnsRecords returned nil as result") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good question. I found that result
and result.Result
are both initialized when creating the DNS record for the first time. The results gets filtered for the record.Spec.DNSName
, so whether it's the very first DNS record in the hosted zone isn't a factor.
I agree it's a bit odd to check for result.Result
, but the original author wrote this logic, I haven't update it as works as intended so far, and it's possible there is a subtle reason why it's here.
@gcs278 I didn't re-review this because it was on hold. I assume that was an oversight. |
@candita Yea I forgot to take the hold off this time, though I often will put holds to ensure every gets a chance to review, or I'm waiting for final CI results, but still want code reviews. I could be wrong. I use WIP as an in indicator it's not ready for review. /hold cancel |
/retest |
@@ -190,9 +194,15 @@ func (p *Provider) createOrUpdateDNSRecord(record *iov1.DNSRecord, zone configv1 | |||
log.Info("Warning: TTL must be one of [1 60 120 300 600 900 1800 3600 7200 18000 43200]. RecordTTL set to default", "default DSNSVCS record TTL", defaultDNSSVCSRecordTTL) | |||
record.Spec.RecordTTL = defaultDNSSVCSRecordTTL | |||
} | |||
// We only support one target, warn the user. | |||
if len(record.Spec.Targets) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add this to ValidateInputDNSData? Shouldn't this error stop the addition of the record, rather than just logging it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add this to ValidateInputDNSData?
We could, but I only added the warnings on creates, and ValidateInputDNSData
is also called on deletes. And it feels odd to warn on deletes, because we actually do want to allow deleting with multiple targets for compatibility.
It's nuanced, but in the extremely rare case that someone was unsupportedly using multiple targets previously, the delete function will iterate through them to ensure all records are deleted to prevent an orphaned DNS record:
cluster-ingress-operator/pkg/dns/ibm/private/dnssvcs_provider.go
Lines 132 to 134 in e486a51
// While creating DNS records with multiple targets is unsupported, we still | |
// iterate through all targets during deletion to be extra cautious. | |
for _, target := range record.Spec.Targets { |
Shouldn't this error stop the addition of the record, rather than just logging it?
I used CIO's AWS DNS provider as precedent, it doesn't break or error out, or provide a warning:
cluster-ingress-operator/pkg/dns/aws/dns.go
Lines 515 to 516 in 047bd98
// TODO: handle >0 targets | |
domain, target := record.Spec.DNSName, record.Spec.Targets[0] |
But I suppose erroring out would prevent a user from getting into a partially functioning state, requiring them to fix the targets upfront, to only provide what is supported. Maybe the AWS logic is bad precedent. That does seem like a better signal than a warning. I can update. But unfortunately still can't go into ValidateInputDNSData
because of what I mentioned above (we still want the delete to continue with multiple targets).
for _, target := range record.Spec.Targets { | ||
listOpt.SetContent(target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, how did anything work with this here? But do we want to add just one target to the listOpt, or is that the issue you're fixing - we don't want to match on name + target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I was surprised too! Another reason I pushed for IBM Cloud E2E (which is unfortunately still in the works).
the issue you're fixing - we don't want to match on name + target?
Exactly. DNS Name is unique already, so that should be our only "key" for our search. The problem with filtering on name + target is that it works the first time, but when the target get updated, we will never find a result when the load balancer get recreated (the target changes and we only have the new target). That results in this code always attempting to create (new) the DNS Record, which fails when the DNS record actually exists.
Hence QE finding that you can create an IngressController in IBM just fine, but you cannot delete the load balancer-type service without DNS failing. That's a problem when our standard procedures (like a scope change) require a service deletion.
return fmt.Errorf("createOrUpdateDNSRecord: ListAllDnsRecords returned nil as result") | ||
log.Info("created DNS record", "record", record.Spec, "zone", zone, "target", target) | ||
} else { | ||
if result.Result[0].ID == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible for len(result.Result) > 1
, then you'd need to iterate over the results, wouldn't you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's safe to assume it would only ever been 1 results (DNS record) being returned. DNS Name is unique. So our search for DNS name should only ever return 1 record at most. Also, this assumption is already in place here, so I'm not introducing something new, just adding validation:
updateOpt := dnsService.NewUpdateDnsRecordOptions(*result.Result[0].ID) |
Both IBM public and private cloud unit tests were missing unit test coverage. This update extends test coverage for the Delete and CreateOrUpdateRecord functions. This commit provides an important point of reference for future commits that may preturb the existing functionality. Both Test_createOrUpdateDNSRecord functions previously only tested the update logic. In order to test the create logic, `CreateDNSRecord` and `CreateResourceRecord` needed to be implemented in the public and private `fake_client.go` respectively. The new test cases required the ability to control the response and results of `ListAllDnsRecords`, which were previously hardcoded in both public and private IBM cloud unit tests. Both public and private unit tests were updated to use the new OutputResults field specified in the `ListAllDnsRecordsInputOutput` struct, allowing the new test cases to specify no result (indicating no existing DNS record) so we can trigger the create logic. Lastly, various test cases were added to cover untested scenarios, such as testing CNAME Records, mismatching targets, missing record types or IDs, handling nil results, etc.
The IBM Public Cloud DNS provider (`cis_provider.go`) had a bug in `createOrUpdateDNSRecord` where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed in IBM Cloud DNS providers. The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique. Additionally, the IBM DNS logic doesn't work for multiple targets and this creates unexpected and problematic results. The logic has been refactored to only create using the first target and it warns the user when multiple targets are set. This change is low risk since the Ingress Operator will never create a DNSRecord with multiple targets in `desiredDNSRecord`. Modify TestScopeChange to check for `Available=True` after deleting the service to add test coverage for this issue.
e486a51
to
b0b91bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for another code review @candita! Comments should be addressed.
@@ -190,9 +194,15 @@ func (p *Provider) createOrUpdateDNSRecord(record *iov1.DNSRecord, zone configv1 | |||
log.Info("Warning: TTL must be one of [1 60 120 300 600 900 1800 3600 7200 18000 43200]. RecordTTL set to default", "default DSNSVCS record TTL", defaultDNSSVCSRecordTTL) | |||
record.Spec.RecordTTL = defaultDNSSVCSRecordTTL | |||
} | |||
// We only support one target, warn the user. | |||
if len(record.Spec.Targets) > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add this to ValidateInputDNSData?
We could, but I only added the warnings on creates, and ValidateInputDNSData
is also called on deletes. And it feels odd to warn on deletes, because we actually do want to allow deleting with multiple targets for compatibility.
It's nuanced, but in the extremely rare case that someone was unsupportedly using multiple targets previously, the delete function will iterate through them to ensure all records are deleted to prevent an orphaned DNS record:
cluster-ingress-operator/pkg/dns/ibm/private/dnssvcs_provider.go
Lines 132 to 134 in e486a51
// While creating DNS records with multiple targets is unsupported, we still | |
// iterate through all targets during deletion to be extra cautious. | |
for _, target := range record.Spec.Targets { |
Shouldn't this error stop the addition of the record, rather than just logging it?
I used CIO's AWS DNS provider as precedent, it doesn't break or error out, or provide a warning:
cluster-ingress-operator/pkg/dns/aws/dns.go
Lines 515 to 516 in 047bd98
// TODO: handle >0 targets | |
domain, target := record.Spec.DNSName, record.Spec.Targets[0] |
But I suppose erroring out would prevent a user from getting into a partially functioning state, requiring them to fix the targets upfront, to only provide what is supported. Maybe the AWS logic is bad precedent. That does seem like a better signal than a warning. I can update. But unfortunately still can't go into ValidateInputDNSData
because of what I mentioned above (we still want the delete to continue with multiple targets).
for _, resourceRecord := range listResult.ResourceRecords { | ||
if *resourceRecord.Name == dnsName { | ||
if resourceRecord.ID == nil { | ||
return fmt.Errorf("createOrUpdateDNSRecord: record id is nil") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resourceRecord.Name
must be unique in IBM Cloud DNS, and at this point in time we've already found a match with dnsName
, that's the only DNS name. No point in continuing.
/retest |
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
e2e-aws-ovn-serial has issue that openshift/origin#29447 is attempting to fix |
The IBM Public Cloud DNS provider (
cis_provider.go
) had a bug increateOrUpdateDNSRecord
where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed.The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique.
Additionally, the IBM DNS logic doesn't work for multiple targets and this creates unexpected and problematic results. The logic has been refactored to only create and delete using the first target. It warns the user when multiple targets are set.
This PR also includes some unit test fix up and missing unit test coverage for the IBM CIS Provider.
This resolves the same DNS issues for public PowerVS cloud as it uses the same logic.