Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk snapshots fail on GCP regional volumes #765

Closed
robbyt opened this issue Aug 14, 2018 · 15 comments · Fixed by #955
Closed

Disk snapshots fail on GCP regional volumes #765

robbyt opened this issue Aug 14, 2018 · 15 comments · Fixed by #955

Comments

@robbyt
Copy link

robbyt commented Aug 14, 2018

What steps did you take and what happened:
When using regional volumes on GCP, backups fail with the error below. Here is a basic configuration for creating a PVC backed by a regional volume.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: regional-magnetic
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
  zones: us-central1-b, us-central1-c
reclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: regionalVol
  labels:
    app: foo
spec:
  storageClassName: regional-magnetic
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  gcePersistentDisk:
    fsType: ext4
    pdName: regionalVol
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: regionalPVC
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: regional-magnetic
  selector:
    matchLabels:
      app: foo
  resources:
    requests:
      storage: 10Gi

What did you expect to happen:
I expected the snapshots to be created similarly as they are for non-regional volumes.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

time="2018-08-14T05:56:14Z" level=error msg="backup failed" error="[error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-f__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter, error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-b__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter, error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-f__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter, error creating snapshot: rpc error: code = Unknown desc = googleapi: Error 400: Invalid value 'us-central1-b__us-central1-c'. Values must match the following regular expression: '[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?', invalidParameter]" key=heptio-ark/my-backup logSource="pkg/controller/backup_controller.go:280"

Environment:

  • Ark version (use ark version): 0.9.3

  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-18T11:37:06Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.5-gke.4", GitCommit:"6265b9797fc8680c8395abeab12c1e3bad14069a", GitTreeState:"clean", BuildDate:"2018-08-04T03:47:40Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: gke
  • Cloud provider or hardware configuration: gcp
@skriss
Copy link
Contributor

skriss commented Aug 15, 2018

Thanks for reporting this @robbyt, we'll look into it and let you know when we can get it fixed

@rosskukulinski
Copy link
Contributor

@skriss I put this against the backup replication Epic

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2018

@rosskukulinski given that this appears to be a bug, I'm not sure putting in under replication is appropriate?

@robbyt
Copy link
Author

robbyt commented Oct 5, 2018

This bug is still present in 0.9.7.

According to the docs, the snapshot API request should be sent to only one of the regions.

https://cloud.google.com/compute/docs/disks/create-snapshots#create_a_snapshot_of_a_regional_persistent_disk

@ncdc
Copy link
Contributor

ncdc commented Oct 5, 2018

@wwitzel3 PTAL

@wwitzel3 wwitzel3 self-assigned this Oct 5, 2018
@wwitzel3
Copy link
Contributor

wwitzel3 commented Oct 11, 2018

The problem is that in GKE, when using a regional volume we get two labels:

failure-domain.beta.kubernetes.io/region: us-central1
failure-domain.beta.kubernetes.io/zone: us-central1-b__us-central1-c

Ark looks for failure-domain.beta.kubernetes.io/zone and passes that along to the disks.createSnapshot API call passed in as the zone. Which results in the failure, since us-central1-b__us-central1-c is not a valid zone.

I came up with a solution, but after testing it and doing a bit more research, my fix of parsing the combined zones out of the failure-domain.beta.kubernetes.io/zone and then passing them to our existing CreateSnapshot call was a bit naive.

Edit: I think we need to look in to checking for the double-under __ convention in the zone and then use the failure-domain.beta.kubernetes.io/region label if we find it, using the regionDisks API to perform our operations.

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2018

Is the region label only present with regional disks and absent otherwise?

@wwitzel3
Copy link
Contributor

No, we would have to look for the double-under __ convention in the zone.

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2018

Sounds like that will have to be the way forward, doesn't it?

@wwitzel3
Copy link
Contributor

wwitzel3 commented Oct 11, 2018

So I'm thinking then, we could leave the plugin interface as it, and just special case this in the gcp plugin, checking the incoming volumeAZ for __.

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2018

Most definitely - this is quite GCP specific

@ncdc
Copy link
Contributor

ncdc commented Oct 11, 2018

Also worth mentioning that regional disks in GCP are beta at this time

@wwitzel3
Copy link
Contributor

Should we add a volumeRegion (along side volumeAZ) to the BlockStore interface methods? As of now, GCP would be the only provider to use that field, so it seems like the wrong way to go.

I keep writing stuff and then just deleting. I can just pick an approach, put up the PR, and let people digest it and see if we want to change it.

@marekaf
Copy link

marekaf commented Nov 15, 2018

What is the status on this? It is still a problem in my version of ark

@skriss
Copy link
Contributor

skriss commented Nov 15, 2018

@BartiMar we have a PR in flight for this -- look for it in an upcoming patch release (we do these every 2 weeks on thursdays)

@nrb nrb closed this as completed in #955 Dec 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants