Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm] Updated helm install docs based on new issues #6476

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

Zanz2
Copy link

@Zanz2 Zanz2 commented Jul 13, 2023

Updated helm deploy documentation:

Added a tip for users installing locally with microk8s which is more popular on ubuntu than minikube. Added a workaround/fix for the multi attach volume error and information on running migrations manually.

closes #6475
resolves #5029

Motivation and context

This is required to improve the user experience of using helm to deploy CVAT

How has this been tested?

I have previewed the docs

Checklist

  • I submit my changes into the develop branch
  • I have added a description of my changes into the CHANGELOG file
  • I have updated the documentation accordingly
  • [ ] I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • [ ] I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@Zanz2 Zanz2 changed the title Updated helm install docs based on new issues [Helm] Updated helm install docs based on new issues Jul 19, 2023
@Keramblock
Copy link
Contributor

Keramblock commented Jul 25, 2023

Hi, today PR with helm fixes was merged, so mb you want to check it out and update something: #6043

kubectl exec -it --namespace $HELM_RELEASE_NAMESPACE $BACKEND_POD_NAME -c cvat-app-backend-server-container -- python manage.py migrate &&\
kubectl exec -it --namespace $HELM_RELEASE_NAMESPACE $BACKEND_POD_NAME -c cvat-app-backend-server-container -- python manage.py health_check
```
### Im getting multi attach volume errors when my backend pods run on different nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is new option in values.yaml for it now

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to allow ReadWriteMany via the values.yaml? I think thats a great addition, and i even pulled #6137 while it was still open into my deployment. but in my case our volumes dont support ReadWriteMany, so I (and i imagine many other people) would still need to use ReadWriteOnce with podAffinity.

@@ -319,3 +326,26 @@ cvat:
claimName: my-claim-name

```
### Im receiving service unavailable errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, that could be misleading, because there is a lot of other options exists why it is happening

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i aggree, the wording is vague, maybe something "Website not loading after running helm install, but redis opa and postgresql are running" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinking more like:

### Im receiving service unavailable errors
There could be multiple explanations for that, but generally that is because your ingress controller do not recieve response from backend, some possible explanations are 
1. Server is down:
  A. your volume is RWO instead of RWX
  B. your DB is not responding
2. Service do not see pod:
  A. Check labels on pods
  B. Check label selector on service
3. 
####

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feedback, i improved it

@deeeed
Copy link

deeeed commented Aug 7, 2023

Any chance to improve the docs and give all the steps for a "production" setup?

I have been struggling with my config on ubuntu:

postgresql:
  enabled: false
  external:
    host: postgres-postgresql.postgres.svc.cluster.local
    port: 5432
  auth:
    username: cvatuser
    database: cvat
  secret:
    password: cvatpassword
traefik:
  enabled: false

ingress:
  enabled: true
  className: public
  annotations:
    cert-manager.io/cluster-issuer: lets-encrypt
  tls:
    - secretName: cvat-tls
      hosts:
        - cvat.mydns.com
  hosts:
    - host: cvat.mydns.com
      paths:
        - path: /api
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path: /admin
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path: /static
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path: /django-rq
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path: /git
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path: /opencv
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path: /profiler
          pathType: "Prefix"
          service:
            name: backend-service
            port: 8080
        - path : /
          pathType: "Prefix"
          service:
            name: frontend-service
            port: 80


redis:
  enabled: true
  volumesPermissions:
    enabled: true
    containerSecurityContext:
      runAsUser: 1000
  master:
    persistence:
      existingClaim: cvat-redis-master-pvc
    containerSecurityContext:
      runAsUser: 1000
    podSecurityContext:
      fsGroup: 1000
      runAsUser: 1000
      runAsGroup: 1000
  replica:
    persistence:
      existingClaim: cvat-redis-replica-pvc
    containerSecurityContext:
      runAsUser: 1000
    podSecurityContext:
      fsGroup: 1000
      runAsGroup: 1000
      runAsUser: 1000
zookeeper:
  enabled: false
analytics:
  enabled: false
#   ingress:
#     path: /analytics
#     pathType: 'Prefix'
#     annotations:
#       kubernetes.io/ingress.class: traefik
#     service:
#       name: grafana
#       port: 80
# grafana:
#   grafana.ini:
#     server:
#       root_url: https://cvat.siteed.net/analytics
cvat:
  frontend:
    service:
      type: LoadBalancer
      loadBalancerIP: 192.168.50.168
  backend:
    disableDistinctCachePerService: true
    defaultStorage:
      enabled: false
    permissionFix:
      enabled: false
    server:
      additionalVolumes:
        - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-pvc
    worker:
      export:
          additionalVolumes:
          - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-pvc
      import:
          additionalVolumes:
          - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-pvc
      annotation:
          additionalVolumes:
          - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-pvc
    utils:
      additionalVolumes:
        - name: cvat-backend-data
          persistentVolumeClaim:
            claimName: cvat-pvc

Here are the steps I take before:

kubectl create namespace cvat

Setup a PVC for Redis

kind: PersistentVolume
apiVersion: v1
metadata:
  name: cvat-redis-master
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/volumes/redis/master"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cvat-redis-master-pvc
  namespace: cvat
spec:
  storageClassName: manual
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: cvat-redis-replica  # Sets PV's name
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi # Sets PV Volume
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/volumes/redis/replica1"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cvat-redis-replica-pvc
  namespace: cvat
spec:
  storageClassName: manual
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi

storage

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-csi-cvat
provisioner: nfs.csi.k8s.io
parameters:
  server: 192.168.50.4
  share: /volumes/cvat
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
  - hard
  - nfsvers=4.1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  namespace: cvat
  name: cvat-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi
  storageClassName: nfs-csi-cvat

Everything seems to run correctly, pods are up, but when I access the frontend all the calls to the backend seems to fail:
image
Mostly 405 http errors.

Any Suggestions on what I could be missing?

@Zanz2
Copy link
Author

Zanz2 commented Aug 8, 2023

I think that would warrant a separate issue. I havent encountered that problem, but if you are using ingress, then wouldnt you connect to your cvat deployment via: cvat.mydns.com and not the ip of the frontend pod?

@deeeed
Copy link

deeeed commented Aug 8, 2023

good point, it definitely would be better on a separate issue.

@nmanovic nmanovic added the Easy to fix The issue is easy to fix and probably it will be release in a next minor release label Sep 5, 2023
@nmanovic nmanovic requested review from azhavoro and removed request for mdacoca September 5, 2023 09:26
@hadilou
Copy link

hadilou commented Oct 11, 2023

@deeeed did you manage to fix this? I am having the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy to fix The issue is easy to fix and probably it will be release in a next minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Helm] helm deploy documentation is not up to date Deploying on GKE using the helm chart not working
5 participants