Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad DNS config while installing MAS with Cloudflare custom domain #547

Closed
QDespeisseTalan opened this issue Oct 11, 2023 · 3 comments
Closed
Labels
Triage Issue was triaged and acknowledged

Comments

@QDespeisseTalan
Copy link

Hello,

While installing MAS through CLI, I setted up a custom domain with a Cloudflare DNS.
During the installation, "suite-verify" task was stucked because the public certificated couldn't get fetched by the Cert Manager.

After looking at the challenge from the order of the certificaterequest, I got the following issue :
Waiting for DNS-01 challenge propagation: Could not determine the zone for "_acme-challenge.home.mas.servicemanageteamaca.fr.": When querying the SOA record for the domain 'home.mas.servicemanageteamaca.fr.' using nameservers [172.30.0.10:53], rcode was expected to be 'NOERROR' or 'NXDOMAIN', but got 'SERVFAIL'

The Cert Manager tries to query the Openshift nameserver instead of querying a public DNS.
To fix it, I had to :

  • Go to ibm-common-services project > Deployments
  • Set ibm-cert-manager-operator pods number to 0
  • Open cert-manager-controller deployment and add those lines to args :
    - '--dns01-recursive-nameservers-only'
    - '--dns01-recursive-nameservers=8.8.8.8:53'

Then the MAS public certificated have been issued and other ones too.

I guess an automated workaround should be found to do this automatically on the task.

@andrercm
Copy link
Contributor

@QDespeisseTalan i believe this problem also happens with AWS Route 53, and we have a statement about that in our suite_dns role

In some cases, the Route 53 zone may not resolve the certificate challenges generated by IBM Certificate Manager, which could cause a problem while issuing the public certificates via Let's Encrypt. In this case, a manual workaround might be needed in cert-manager-controller pod to enable recursive nameservers.

For more details on how to apply this workaround, refer to [this documentation](https://community.ibm.com/community/user/asset-facilities/blogs/brian-zhu/2022/10/08/using-lets-encrypt-ssl-certificates-with-maximo-ap?CommunityKey=3d7261ae-48f7-481d-b675-a40eb407e0fd).

This workaround is not automated, as most of the times this is not needed, but when it is, it requires the `cert-manager-controller` pod to be stopped, thus we want to avoid making this behavior as standard approach.

So back at the time when this was found out, the decision to not automate this workaround was directly tied to the fact that this requires cert-manager controller to be disabled (set deployment pods to 0) which may cause critical implications to the proper functioning of cert-manager as this would prevent the operator to reconcile any changes as standard behavior.

@andrercm andrercm added the Triage Issue was triaged and acknowledged label Oct 19, 2023
@QDespeisseTalan
Copy link
Author

@andrercm after adding the DNS entry for apps.mas.servicemanageteamaca.fr like stated in #548 the certificates where able to be issued correctly without having to disable cert-manager-controller.
Hopefully, the same problem could be fixed with AWS Route 53.

@andrercm
Copy link
Contributor

same fix as #548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage Issue was triaged and acknowledged
Projects
None yet
Development

No branches or pull requests

2 participants