Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set storage class for aliyun, and hope to support minikube for test and learning #21

Closed
wangqia0309 opened this issue Jun 10, 2021 · 20 comments

Comments

@wangqia0309
Copy link

config/samples/apps_v1alpha1_nebulacluster.yaml
please add more examples

@MegaByte875
Copy link
Contributor

Hi, if you use ACK, you only need change the storageClassName 'gp2' to 'alicloud-disk-ssd',please read the instructions Use dynamically provisioned disks for stateful applications.
We will consider testing environment minikube.

@wey-gu
Copy link
Contributor

wey-gu commented Jun 10, 2021

config/samples/apps_v1alpha1_nebulacluster.yaml
please add more examples

by the way, If the reason you used minikube is for test/playground purposes, you could try https://github.com/wey-gu/nebula-operator-kind , it's a toy project to help create a nebula-operator sample cluster in one line, please note it's not for the production of course.

@wangqia0309
Copy link
Author

@MegaByte875 @wey-gu
found one error when helm install nebula-operator to aliyun
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

@wey-gu
Copy link
Contributor

wey-gu commented Jun 11, 2021

@MegaByte875 @wey-gu
found one error when helm install nebula-operator to aliyun
Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Certificate" in version "cert-manager.io/v1", unable to recognize "": no matches for kind "Issuer" in version "cert-manager.io/v1"]

You need to install dependencies first, here the error refers to cert-manager CRD cannot be recognized, that is cert-manager is not installed :-)

ref: https://github.com/vesoft-inc/nebula-operator/blob/master/doc/user/install_guide.md

RBAC enabled (optional)
CoreDNS >= 1.6.0
CertManager >= 1.2.0
OpenKruise >= 0.8.0
Helm >= 3.2.0

@wangqia0309
Copy link
Author

wangqia0309 commented Jun 11, 2021

image
nebula-operator is launched, but storage,meta,graph process can't normally run
@MegaByte875 @wey-gu

NAME READY STATUS RESTARTS AGE
nebula-graphd-0 0/1 Running 0 6m16s
nebula-metad-0 0/1 Running 0 6m17s
nebula-operator-controller-manager-deployment-74fb689875-8p854 2/2 Running 0 58m
nebula-operator-controller-manager-deployment-74fb689875-gkkpx 2/2 Running 0 58m
nebula-operator-scheduler-deployment-fc9c797c6-5nhjm 2/2 Running 0 58m
nebula-operator-scheduler-deployment-fc9c797c6-s5pgv 2/2 Running 0 58m
nebula-storaged-0 0/1 Running 0 6m17s
nebula-storaged-1 0/1 Running 0 6m17s
nebula-storaged-2 0/1 Running 0 6m17s

@MegaByte875
Copy link
Contributor

please show me the output "kubectl describe pod nebula-storaged-0" and "kubectl get pod nebula-storaged-0 -oyaml" @wangqia0309

@wangqia0309
Copy link
Author

wangqia0309 commented Jun 15, 2021

please show me the output "kubectl describe pod nebula-storaged-0" and "kubectl get pod nebula-storaged-0 -oyaml" @wangqia0309

@MegaByte875
this is describe

Name:         nebula-storaged-0
Namespace:    nebula
Priority:     0
Node:         cn-beijing.172.17.0.236/172.17.0.236
Start Time:   Fri, 11 Jun 2021 18:27:32 +0800
Labels:       app.kubernetes.io/cluster=nebula
              app.kubernetes.io/component=storaged
              app.kubernetes.io/managed-by=nebula-operator
              app.kubernetes.io/name=nebula-graph
              controller-revision-hash=nebula-storaged-675dfb4688
              statefulset.kubernetes.io/pod-name=nebula-storaged-0
Annotations:  kubernetes.io/psp: ack.privileged
              nebula-graph.io/cm-hash: 563a13ee319762c8
Status:       Running
IP:           172.22.0.2
IPs:
  IP:           172.22.0.2
Controlled By:  StatefulSet/nebula-storaged
Containers:
  storaged:
    Container ID:  docker://a55dc78665ba3938ef5079993b6bc4bb4cfc70833edbd8d6ef0635ba02dc0083
    Image:         registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
    Image ID:      docker-pullable://registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula@sha256:0756d3ba427debc62239805bb2136f009d2305ce06b220b75e61f158056d75fb
    Ports:         9779/TCP, 19779/TCP, 19780/TCP, 9778/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/bash
      -ecx
      exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --minloglevel=1 --v=0 --daemonize=false
    State:          Running
      Started:      Fri, 11 Jun 2021 18:27:44 +0800
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        500m
      memory:     500Mi
    Readiness:    http-get http://:19779/status delay=20s timeout=5s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /usr/local/nebula/data from storaged (rw,path="data")
      /usr/local/nebula/etc from nebula-storaged (rw)
      /usr/local/nebula/logs from storaged (rw,path="logs")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4gnrq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  storaged:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  storaged-nebula-storaged-0
    ReadOnly:   false
  nebula-storaged:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebula-storaged
    Optional:  false
  default-token-4gnrq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4gnrq
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                        From     Message
  ----     ------     ----                       ----     -------
  Warning  Unhealthy  2m25s (x31858 over 3d16h)  kubelet  Readiness probe failed: Get http://172.22.0.2:19779/status: dial tcp 172.22.0.2:19779: connect: connection refused

and yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: ack.privileged
    nebula-graph.io/cm-hash: 563a13ee319762c8
  creationTimestamp: "2021-06-11T10:27:30Z"
  generateName: nebula-storaged-
  labels:
    app.kubernetes.io/cluster: nebula
    app.kubernetes.io/component: storaged
    app.kubernetes.io/managed-by: nebula-operator
    app.kubernetes.io/name: nebula-graph
    controller-revision-hash: nebula-storaged-675dfb4688
    statefulset.kubernetes.io/pod-name: nebula-storaged-0
  name: nebula-storaged-0
  namespace: nebula
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: nebula-storaged
    uid: 71326412-f235-4fe4-a979-98fcb9bf42a2
  resourceVersion: "929981884"
  selfLink: /api/v1/namespaces/nebula/pods/nebula-storaged-0
  uid: ad0b6616-4dd4-4380-adf0-2253e85f9c98
spec:
  containers:
  - command:
    - /bin/bash
    - -ecx
    - exec /usr/local/nebula/bin/nebula-storaged --flagfile=/usr/local/nebula/etc/nebula-storaged.conf
      --meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559
      --local_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local --ws_ip=$(hostname).nebula-storaged-headless.nebula.svc.cluster.local
      --minloglevel=1 --v=0 --daemonize=false
    image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
    imagePullPolicy: IfNotPresent
    name: storaged
    ports:
    - containerPort: 9779
      name: thrift
      protocol: TCP
    - containerPort: 19779
      name: http
      protocol: TCP
    - containerPort: 19780
      name: http2
      protocol: TCP
    - containerPort: 9778
      name: admin
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /status
        port: 19779
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 500Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/local/nebula/logs
      name: storaged
      subPath: logs
    - mountPath: /usr/local/nebula/data
      name: storaged
      subPath: data
    - mountPath: /usr/local/nebula/etc
      name: nebula-storaged
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-4gnrq
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: nebula-storaged-0
  imagePullSecrets:
  - name: acr-credential-a0fa064cb4ce770d628e28389a5eff36
  - name: acr-credential-79873d0d479756dcb41f2157e7ef6512
  - name: acr-credential-24854a7970e1cadb8173632e77a2be46
  - name: acr-credential-64e9a936224ff365bbd88cdc91a39a86
  - name: acr-credential-df469bbe2cfaa576fab48b6f52d33a82
  nodeName: cn-beijing.172.17.0.236
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  subdomain: nebula-storaged-headless
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app.kubernetes.io/cluster: nebula
        app.kubernetes.io/component: storaged
        app.kubernetes.io/managed-by: nebula-operator
        app.kubernetes.io/name: nebula-graph
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - name: storaged
    persistentVolumeClaim:
      claimName: storaged-nebula-storaged-0
  - configMap:
      defaultMode: 420
      items:
      - key: nebula-storaged.conf
        path: nebula-storaged.conf
      name: nebula-storaged
    name: nebula-storaged
  - name: default-token-4gnrq
    secret:
      defaultMode: 420
      secretName: default-token-4gnrq
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    message: 'containers with unready status: [storaged]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    message: 'containers with unready status: [storaged]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-11T10:27:32Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://a55dc78665ba3938ef5079993b6bc4bb4cfc70833edbd8d6ef0635ba02dc0083
    image: registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula:nebula-storaged-2.0
    imageID: docker-pullable://registry-vpc.cn-beijing.aliyuncs.com/galixir/nebula@sha256:0756d3ba427debc62239805bb2136f009d2305ce06b220b75e61f158056d75fb
    lastState: {}
    name: storaged
    ready: false
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-06-11T10:27:44Z"
  hostIP: 172.17.0.236
  phase: Running
  podIP: 172.22.0.2
  podIPs:
  - ip: 172.22.0.2
  qosClass: Burstable
  startTime: "2021-06-11T10:27:32Z"

@veezhang
Copy link
Contributor

veezhang commented Jun 16, 2021

@wangqia0309 Hi, Can you provide some log?

  • log for nebula-storaged-0
kubectl exec -it nebula-storaged-0 -- cat logs/nebula-storaged.INFO 
  • log for nebula-metad-0
kubectl exec -it nebula-metad-0 -- cat logs/nebula-metad.INFO

@wangqia0309
Copy link
Author

wangqia0309 commented Jun 16, 2021

@wangqia0309 Hi, Can you provide some log?

  • log for nebula-storaged-0
kubectl exec -it nebula-storaged-0 -- cat logs/nebula-storaged.INFO 
  • log for nebula-metad-0
kubectl exec -it nebula-metad-0 -- cat logs/nebula-metad.INFO

these services were not running,no container
@veezhang

@veezhang
Copy link
Contributor

veezhang commented Jun 16, 2021

@wangqia0309 Emm, I'll create a cluster with aliyun. And which version of kubernetes?

@wangqia0309
Copy link
Author

@wangqia0309 Emm, I'll create a cluster with aliyun. And which version of kubernetes?

Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.8-aliyun.1", GitCommit:"2cbb16c", GitTreeState:"", BuildDate:"2021-01-27T02:20:04Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

@veezhang
Copy link
Contributor

@wangqia0309

Can you provide the following information?

  • List the pv and pvc
kubectl get pv,pvc
  • Describe pvc graphd-nebula-graphd-0
kubectl describe pvc storaged-nebula-storaged-0

@veezhang
Copy link
Contributor

@wangqia0309
Maybe the storage you requested does not meet the requirements of aliyun. If so, please modify your yaml definition.

  • Valid values when DiskCategory is set to cloud: 5 to 2000
  • Valid values when DiskCategory is set to cloud_efficiency: 20 to 32768
  • Valid values when DiskCategory is set to cloud_ssd: 20 to 32768
  • Valid values when DiskCategory is set to cloud_essd: 20 to 32768

See https://partners-intl.aliyun.com/help/doc-detail/25513.htm for details.

@veezhang
Copy link
Contributor

@wangqia0309 There is an example to create Nebula Cluster on aliyun. Hope it is useful to you.

Create a Kubernetes and wait for ready.

Setup cert-manager, openkruise and nebula-operator.

helm install cert-manager cert-manager --repo https://charts.jetstack.io \
  --namespace cert-manager --create-namespace --version v1.3.1 \
  --set installCRDs=true

helm install kruise https://github.com/openkruise/kruise/releases/download/v0.8.1/kruise-chart.tgz

helm install nebula-operator nebula-operator --repo https://vesoft-inc.github.io/nebula-operator/charts \
  --namespace nebula-operator-system --create-namespace --version 0.1.0 \
  --set image.kubeRBACProxy.image=kubesphere/kube-rbac-proxy:v0.8.0 \
  --set image.kubeScheduler.image=kubesphere/kube-scheduler:v1.18.8

Create a Nebula Cluster

cat <<EOF | kubectl apply -f -
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
spec:
  graphd:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-graphd
    version: v2.0.0
    service:
      type: NodePort
      externalTrafficPolicy: Local
    storageClaim:
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-ssd
  metad:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 1
    image: vesoft/nebula-metad
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-ssd
  storaged:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 3
    image: vesoft/nebula-storaged
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-disk-ssd
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  imagePullPolicy: IfNotPresent
EOF

Create a console to connect the nebula cluster

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nebula-console
spec:
  containers:
    - name: nebula-console
      image: vesoft/nebula-console:v2-nightly
      command:
      - sleep
      - "1000000"
EOF

Have fun

kubectl exec -it nebula-console -- nebula-console -u root -p a --addr nebula-graphd-svc --port 9669
2021/06/17 08:43:54 [INFO] connection pool is initialized successfully

Welcome to Nebula Graph!

(root@nebula) [(none)]> show hosts
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| Host                                                                   | Port | Status   | Leader count | Leader distribution  | Partition distribution |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-0.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-1.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "nebula-storaged-2.nebula-storaged-headless.default.svc.cluster.local" | 9779 | "ONLINE" | 0            | "No valid partition" | "No valid partition"   |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
| "Total"                                                                |      |          | 0            |                      |                        |
+------------------------------------------------------------------------+------+----------+--------------+----------------------+------------------------+
Got 4 rows (time spent 3315/4918 us)

Thu, 17 Jun 2021 08:44:03 UTC

(root@nebula) [(none)]>

@wangqia0309
Copy link
Author

wangqia0309 commented Jun 17, 2021

terminate called after throwing an instance of 'std::system_error'
what(): Failed to resolve address for 'nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
*** Aborted at 1623927557 (unix time) try "date -d @1623927557" if you are using GNU date ***
PC: @ 0x7f0ee1fd4387 __GI_raise
*** SIGABRT (@0x1) received by PID 1 (TID 0x7f0ee2ec18c0) from PID 1; stack trace: ***
@ 0x1e5f9c1 (unknown)
@ 0x7f0ee237b62f (unknown)
@ 0x7f0ee1fd4387 __GI_raise
@ 0x7f0ee1fd5a77 __GI_abort
@ 0x107f647 _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
@ 0x2219b85 __cxxabiv1::__terminate()
@ 0x2219bd0 std::terminate()
@ 0x2219d03 __cxa_throw
@ 0x1063e8b (unknown)
@ 0x1d12292 folly::SocketAddress::getAddrInfo()
@ 0x1d122b3 folly::SocketAddress::setFromHostPort()
@ 0x19fe77e nebula::WebService::start()
@ 0x1080872 main
@ 0x7f0ee1fc0554 __libc_start_main
@ 0x1096b4d (unknown)

@veezhang i found the error log, i don't know why the address can't be resolved

@wangqia0309
Copy link
Author

@veezhang 应该是域名解析的问题,我们这边的k8s有自己的设置,需要是svc.gsvc.glx.local这样的后缀格式,但是你们的镜像里启动时候,写死了
--meta_server_addrs=nebula-metad-0.nebula-metad-headless.nebula.svc.cluster.local:9559 --local_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local --ws_ip=nebula-graphd-0.nebula-graphd-svc.nebula.svc.cluster.local
这种域名后缀,svc.cluster.local
这种情况该怎么解决呢,能不能不指定后面这段,因为我们的dns会自动补充上我们自己的域名后缀

@veezhang
Copy link
Contributor

veezhang commented Jun 18, 2021

@wangqia0309 I've create a PR #29

After merged, please configure kubernetesClusterDomain with gsvc.glx.local.

Notes:

  • update the operator images
  • update helm repo
    helm repo add nebula-operator https://vesoft-inc.github.io/nebula-operator/charts
    helm repo update
    See Nebula Operator Install Guide for details.

@wey-gu
Copy link
Contributor

wey-gu commented Jun 18, 2021

The outcome of this thread is gold, @veezhang thanks! And it could end up a quite reusable experience/blog post on nebula-operators on top of aliyun. @QingZ11

Thanks @wangqia0309 for your time exploring and helping to improve the nebula graph :-).

@wangqia0309
Copy link
Author

thanks for all, this is the best experience i had been with outer community, hope for the better nebula

@veezhang
Copy link
Contributor

@wangqia0309 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants