Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TACODEV-909: workaround for using machinepool on CAPA #68

Merged
merged 5 commits into from
Sep 15, 2021
Merged

Conversation

intelliguy
Copy link
Contributor

@intelliguy intelliguy commented Aug 11, 2021

  • add job to get subnet and add a machinepool with the info

한번에 aws로 k8s 클러스터를 배포할 수 있습니다.
기존에 최종적으로 update하던 value를 처음부터 주고 돌리면 됩니다.
이미지용 도커파일 및 사용 코드도 모두 artfacts 디렉토리에 넣었습니다

테스트 시 10분이상 돌아야 하고 defult가 5분이므로 timeout 설정을 추가해야 합니다.
--timeout=20 추가해서 배포해야 합니다.

argocd를 통한 배포시에는 timeout이내에 잘 동작함을 확인

cluster-api-aws/templates/job-generate-machine-pool.yaml Outdated Show resolved Hide resolved
cluster-api-aws/artifacts/generate_machine_pool.py Outdated Show resolved Hide resolved
cluster-api-aws/artifacts/Dockerfile Outdated Show resolved Hide resolved
cluster-api-aws/templates/job-generate-machine-pool.yaml Outdated Show resolved Hide resolved
cluster-api-aws/templates/job-generate-machine-pool.yaml Outdated Show resolved Hide resolved
cluster-api-aws/values.yaml Outdated Show resolved Hide resolved
cluster-api-aws/values.yaml Outdated Show resolved Hide resolved
cluster-api-aws/values.yaml Outdated Show resolved Hide resolved
@ktkfree
Copy link
Contributor

ktkfree commented Aug 11, 2021

major 수정이므로, Chart.yaml 의 version 을 0.3.1 정도로 올리는게 좋겠습니다.

@ktkfree
Copy link
Contributor

ktkfree commented Aug 11, 2021

2step 설치가 더는 필요가 없으므로, 기존 수정분( 2step 설치 )은 모두 빼는 것이 좋겠습니다.
그냥 두는 것이 의미가 있는지 의견주시면, 제가 이 PR merge 후 제거토록 하겠습니다.

- add job to get subnet and add a machinepool with the info
@github-actions
Copy link

This PR is stale because it has been open 3 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale There has been no activity on this label Aug 15, 2021
@github-actions
Copy link

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Aug 18, 2021
@intelliguy intelliguy removed the Stale There has been no activity on this label Aug 23, 2021
Copy link
Contributor

@Jaesang Jaesang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

코드 리뷰외 추가) cluster-api-aws/templates/mt-control.yaml 이름이 Machine Template Control로 보이는데, 파일이름이 적절치 않아 보입니다.

@@ -0,0 +1,36 @@
{{- if .Values.machinePool }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helm install 시 이 Job이 끝날때까지 계속 멈춰있는 상태입니다. async하게 바꿀 필요는 없을까요?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node가 만들어지고 lable을 붙이는 job을 추가해야해서 더 오래걸리게 될 듯합니다.
async하게 하기 위해서는 helm chart를 띄어내서 따로 만들어야 하는데
일련의 작업 실행을 생각한다면 본 차트에서 추가하는 것이 좋아 보입니다.
따라서 async는 안될것 같습니다.

{{- $envAll := . }}
{{- range .Values.machinePool }}
{{ .name }}:
MP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MachinePool, AWSMachinePool, KubeadmConfig 값을 이렇게 MP, AMP, KCP 하위로 만들면, K8s가 어떻게 인식하나요?

@Jaesang
Copy link
Contributor

Jaesang commented Aug 25, 2021

@intelliguy 코멘트 달았습니다.

@Jaesang Jaesang self-requested a review August 27, 2021 09:09
Copy link
Contributor

@Jaesang Jaesang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helm install 시 timeout 에러가 발생하며 Job이 수행되지 않습니다.

$ helm install jaesang-909-2 cluster-api-aws -f cluster-api-aws/val
ues-tacodev-909.yaml
Error: failed post-install: timed out waiting for the condition

timeout 을 10분으로 준 뒤 실행하니 Job BackoffLimitExceeded 에러가 발생합니다.

$ time helm install jaesang-909-3 cluster-api-aws -f cluster-api-aw
s/values-tacodev-909.yaml --timeout 10m --debug
install.go:173: [debug] Original chart version: ""
install.go:190: [debug] CHART PATH: /home/ubuntu/helm-charts/cluster-api-aws

client.go:122: [debug] creating 8 resource(s)
client.go:122: [debug] creating 1 resource(s)
client.go:493: [debug] Watching for changes to Job jaesang-909-3-cluster-api-aws with timeout of 10m0s
client.go:521: [debug] Add/Modify event for jaesang-909-3-cluster-api-aws: ADDED
client.go:560: [debug] jaesang-909-3-cluster-api-aws: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:521: [debug] Add/Modify event for jaesang-909-3-cluster-api-aws: MODIFIED
client.go:560: [debug] jaesang-909-3-cluster-api-aws: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:521: [debug] Add/Modify event for jaesang-909-3-cluster-api-aws: MODIFIED
Error: failed post-install: job failed: BackoffLimitExceeded
helm.go:81: [debug] failed post-install: job failed: BackoffLimitExceeded

real    5m16.054s
user    0m0.894s
sys     0m0.322s

job describe

$ kubectl describe jobs jaesang-909-3-cluster-api-aws
Events:
  Type     Reason                Age                From            Message
  ----     ------                ----               ----            -------
  Normal   SuccessfulCreate      6m36s              job-controller  Created pod: jaesang-909-3-cluster-api-aws-w6fzm
  Normal   SuccessfulDelete      82s                job-controller  Deleted pod: jaesang-909-3-cluster-api-aws-w6fzm
  Warning  BackoffLimitExceeded  82s (x2 over 82s)  job-controller  Job has reached the specified backoff limit

@github-actions
Copy link

This PR is stale because it has been open 3 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions github-actions bot added the Stale There has been no activity on this label Aug 30, 2021
@intelliguy intelliguy removed the Stale There has been no activity on this label Aug 31, 2021
@github-actions
Copy link

github-actions bot commented Sep 3, 2021

This PR is stale because it has been open 3 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@github-actions
Copy link

github-actions bot commented Sep 9, 2021

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Sep 9, 2021
@intelliguy intelliguy reopened this Sep 9, 2021
@intelliguy intelliguy requested a review from Jaesang September 10, 2021 12:17
@intelliguy
Copy link
Contributor Author

테스트 시 10분이상 돌아야 하고 defult가 5분이므로 timeout 설정을 추가해야 합니다.
--timeout=20 추가해서 배포해야 합니다.

argocd를 통한 배포시에는 timeout이내에 잘 동작함을 확인

@github-actions github-actions bot removed the Stale There has been no activity on this label Sep 10, 2021
while [ $(kubectl get machinepool -n $3 $1-$2-mp-0 --ignore-not-found | wc -l) == 0 ]
do
echo "> Wait for machinepools deployed (30s)"
sleep 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep 시간이 너무 길어 helm 설치에 소요되는 시간이 많이 깁니다. sleep 1이나 sleep 2는 어떨까요

while [ $(kubectl get machinepool -n $3 $1-$2-mp-0 -o=jsonpath='{.status.nodeRefs}' | wc -c) == 0 ]
do
echo "> Wait for instance is ready (20s)"
sleep 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep 시간이 너무 길어 helm 설치에 소요되는 시간이 많이 깁니다. sleep 1이나 sleep 2는 어떨까요

set -ex

while [ $(kubectl get secret -n $2 $1-kubeconfig --ignore-not-found | wc -l) == 0 ]; do
echo "sleep 30 second"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep 시간이 너무 길어 helm 설치에 소요되는 시간이 많이 깁니다. sleep 1이나 sleep 2는 어떨까요

@seungkyua seungkyua merged commit 75c698b into main Sep 15, 2021
@zugwan zugwan deleted the TACODEV-909 branch September 16, 2021 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants