[BUG] - Deploying Latest 'develop' branch destroys all JupyterLab home directories #2669

kenafoster · 2024-08-30T15:56:26Z

Describe the bug

Run nebari deploy on top of a previously created cluster with an install that includes the commit ed170cb73f11df42d4d6b6536f7bea92ae1fe934 (which adds a 'count' to the efs module). Nebari destroys the entire efs module (so all of the JupyterLab data) and then recreates it

Expected behavior

New changes would be deployed but existing data would persist

OS and architecture in which you are running Nebari

Ubuntu 22.04 amd64

How to Reproduce the problem?

Deploy Nebari into AWS with any previous version up to 2024.7.1. Then install install nebari that includes this commit/this line:

nebari/src/_nebari/stages/infrastructure/template/aws/main.tf

Line 67 in ed170cb

count = var.efs_enabled ? 1 : 0

This happens since adding count = var.efs_enabled ? 1 : 0 changes the module from module.efs.aws_efs_file_system.main to module.efs[0].aws_efs_file_system.main

The fix is to create a terraform moved block .

I just noticed that some other changes included since the 2024.7.1 release have implemented this correctly

nebari/src/_nebari/stages/kubernetes_services/template/jupyterhub.tf

Line 119 in ed170cb

moved {

Command output

The relevant part of the Terraform is:

 [terraform]:   # module.efs.aws_efs_file_system.main will be destroyed
[terraform]:   # (because module.efs is not in configuration)
[terraform]:   - resource "aws_efs_file_system" "main" {
[terraform]:       - arn                             = "arn:aws-us-gov:elasticfilesystem:us-gov-west-1:xxxxxxxxxxxx:file-system/fs-xxxxxxxxxxx" -> null
...
...
...
[terraform]:   # module.efs[0].aws_efs_file_system.main will be created
[terraform]:   + resource "aws_efs_file_system" "main" {
[terraform]:       + arn                     = (known after apply)
[terraform]:       + availability_zone_id    = (known after apply)

Versions and dependencies used.

Nebari version - previously deployed tag 2024.7.1. Now deployed from nebari-dev/nebari:develop HEAD at 498e569

Compute environment

None

Integrations

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

kenafoster · 2024-08-30T16:27:25Z

I'm not sure whether to report this as a follow-up issue but in trying to recover the rest of the destroyed instance, I hit another blocker.

The change to the EFS system forced replacement of module.jupyterhub-nfs-mount[0].kubernetes_persistent_volume.main.

When terraform tried to delete that PV, it timed out because there is a PVC bound to it, but there isn't any dependency in the relevant code (https://github.com/nebari-dev/nebari/blob/ed170cb73f11df42d4d6b6536f7bea92ae1fe934/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/nfs-mount/main.tf)

So terraform sees a change to the kubernetes_persistent_volume that forced replacement, but the underlying destroy PV call is failing is the PV is bound to the PVC created in the same file above. But the PVC doesn't explicitly require the name of the PV (just its storage class) so I guess Terraform can't determine that the PVC must be destroyed/replaced along with the PV if that makes sense

Adam-D-Lewis · 2024-08-30T17:04:50Z

~~@kenafoster I believe this issue is the same as #2638 and should be fixed by #2639 which is already merged. Can you try out the latest develop branch and see if it's still an issue?~~

Update: I think I was mistaken and one more moved block is needed for AWS only

kenafoster added type: bug 🐛 Something isn't working needs: triage 🚦 Someone needs to have a look at this issue and triage labels Aug 30, 2024

github-project-automation bot added this to 🪴 Nebari Project Management Aug 30, 2024

github-project-automation bot moved this to New 🚦 in 🪴 Nebari Project Management Aug 30, 2024

Adam-D-Lewis added this to the Next Release milestone Aug 30, 2024

Adam-D-Lewis mentioned this issue Aug 30, 2024

add moved block to account for terraform changes on AWS only #2673

Merged

10 tasks

dcmcand closed this as completed in #2673 Sep 2, 2024

github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Sep 2, 2024

Adam-D-Lewis removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Deploying Latest 'develop' branch destroys all JupyterLab home directories #2669

[BUG] - Deploying Latest 'develop' branch destroys all JupyterLab home directories #2669

kenafoster commented Aug 30, 2024

kenafoster commented Aug 30, 2024

Adam-D-Lewis commented Aug 30, 2024 •

edited

Loading

[BUG] - Deploying Latest 'develop' branch destroys all JupyterLab home directories #2669

[BUG] - Deploying Latest 'develop' branch destroys all JupyterLab home directories #2669

Comments

kenafoster commented Aug 30, 2024

Describe the bug

Expected behavior

OS and architecture in which you are running Nebari

How to Reproduce the problem?

Command output

Versions and dependencies used.

Compute environment

Integrations

Anything else?

kenafoster commented Aug 30, 2024

Adam-D-Lewis commented Aug 30, 2024 • edited Loading

Adam-D-Lewis commented Aug 30, 2024 •

edited

Loading