Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - local deploy cannot guaranteed be done without having access to a domain #1707

Closed
pmeier opened this issue Apr 11, 2023 · 5 comments
Closed
Assignees
Labels
needs: PR 📬 This item has been scoped and needs to be worked on type: bug 🐛 Something isn't working

Comments

@pmeier
Copy link
Member

pmeier commented Apr 11, 2023

Describe the bug

Related to #1703. TL;DR: our guide tells users to fix /etc/hosts, but the deployed cluster doesn't see this change. There are three ways to achieve that:

  1. Have access to a domain and create a DNS entry for --domain=... to point to 172.18.1.100. This is what we do for github-actions.nebari.dev to get our CI going:
    sudo echo "172.18.1.100 github-actions.nebari.dev" | sudo tee -a /etc/hosts
  2. Use a service like nip.io and set something like --domain=172.18.1.100.nip.io during nebari init. With this users no longer need to have a domain available, but still need to have access to the internet. Plus, we add an external dependency, since our guide then depends on nip.io being available.
  3. Instead of using an URL, users could also use the IP directly, i.e. --domain=172.18.1.100. It is not as pretty as having an URL, but probably good enough for local deploy. The major advantage is that this can be done even without internet access making it the most reliable and easiest to use option of the three. Doing this, the user also doesn't need to fix /etc/hosts.

Regardless of what we decide we want to use, 2. and 3. (as well as our current docs) make an assumption that doesn't hold in 100% of the cases: the cluster IP is fixed at 172.18.1.100.

However, we don't know that. It should work most of the time, but I know for example that it doesn't work for @costrouc. Maybe he can fill in some details on when this won't be true?

Internally, we only know the correct IP after stage 4:

nebari/nebari/deploy.py

Lines 102 to 104 in daecbcf

directory = "stages/04-kubernetes-ingress"
ip_or_name = stage_outputs[directory]["load_balancer_address"]["value"]

Meaning, setting it upfront during nebari init --domain is not guaranteed to work.

Expected behavior

I think we should allow setting no domain upfront and instead of asking the user to set up their DNS (when --disallow-prompt is not set)

nebari/nebari/deploy.py

Lines 128 to 131 in daecbcf

input(
f"Take IP Address {ip_or_hostname} and update DNS to point to "
f'"{config["domain"]}" [Press Enter when Complete]'
)

just use the IP we get from the load balancer.

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

Follow the local deployment how-to in an environment where the load balancer doesn't use 172.18.1.100 as IP.

Command output

No response

Versions and dependencies used.

No response

Compute environment

None

Integrations

No response

Anything else?

No response

@pmeier
Copy link
Member Author

pmeier commented Apr 13, 2023

Looking through our workflow, we actually don't need the domain before stage 04. Meaning, we could accept an empty string in case of a local deployment to mean "just use the load balancer ip". Shouldn't be terribly hard to implement if we want that.

@costrouc
Copy link
Member

costrouc commented Apr 13, 2023

I think we should allow setting no domain upfront and instead of asking the user to set up their DNS (when --disallow-prompt is not set)

Does this mean that you suggest we just make the "domain" optional? And if so we just use what the cloud/docker/kind gives up? This could make a lot of sense and we just tell the user hey this is your url set the domain if you want something different?

Also I think that your issue with the three points describes well the solutions that exist with each not being perfect. I could see how making domain optional and using the ip/cname generated from stage 04 would be enough to get the user started and would certainly be better experience then trying to setup dns initially

@pmeier
Copy link
Member Author

pmeier commented Apr 13, 2023

Does this mean that you suggest we just make the "domain" optional?

Yes, in case we are doing a local deploy. It doesn't really make sense for cloud deploy, does it?

And if so we just use what the cloud/docker/kind gives up?

Yes. My idea is, in case no domain is supplied, we skip

nebari/nebari/deploy.py

Lines 107 to 131 in 9915a6d

if dns_auto_provision and dns_provider == "cloudflare":
record_name, zone_name = (
config["domain"].split(".")[:-2],
config["domain"].split(".")[-2:],
)
record_name = ".".join(record_name)
zone_name = ".".join(zone_name)
if config["provider"] in {"do", "gcp", "azure"}:
update_record(zone_name, record_name, "A", ip_or_hostname)
if config.get("clearml", {}).get("enabled"):
add_clearml_dns(zone_name, record_name, "A", ip_or_hostname)
elif config["provider"] == "aws":
update_record(zone_name, record_name, "CNAME", ip_or_hostname)
if config.get("clearml", {}).get("enabled"):
add_clearml_dns(zone_name, record_name, "CNAME", ip_or_hostname)
else:
logger.info(
f"Couldn't update the DNS record for cloud provider: {config['provider']}"
)
elif not disable_prompt:
input(
f"Take IP Address {ip_or_hostname} and update DNS to point to "
f'"{config["domain"]}" [Press Enter when Complete]'
)

and just use

nebari/nebari/deploy.py

Lines 104 to 105 in 9915a6d

ip_or_name = stage_outputs[directory]["load_balancer_address"]["value"]
ip_or_hostname = ip_or_name["hostname"] or ip_or_name["ip"]

going forward. The last messages after a successful deploy tell the user how to reach the cluster and thus we don't need to do anything there.

@pmeier
Copy link
Member Author

pmeier commented Apr 14, 2023

Summary after an offline discussion with @costrouc

It doesn't really make sense for cloud deploy, does it?

This is unnecessarily strict. If you are doing a cloud deploy, the load balancer will give us the publicly accessible IP as well. Meaning, we can make the domain optional in all cases, which is nice.

@pmeier pmeier self-assigned this Apr 14, 2023
This was referenced Apr 14, 2023
@pavithraes pavithraes added needs: PR 📬 This item has been scoped and needs to be worked on and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels Apr 22, 2023
@pmeier pmeier mentioned this issue May 11, 2023
10 tasks
@pmeier
Copy link
Member Author

pmeier commented Aug 18, 2023

Closing due to #1803 (comment) and #1833 was landed.

@pmeier pmeier closed this as completed Aug 18, 2023
@github-project-automation github-project-automation bot moved this from New 📬 to Done 💪🏾 in 🪴 Nebari Project Management Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: PR 📬 This item has been scoped and needs to be worked on type: bug 🐛 Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

3 participants