Skip to content
This repository has been archived by the owner on Jun 28, 2023. It is now read-only.

add script to run TCE CAPD standalone cluster + travis CI job config to run script #583

Merged

Conversation

karuppiah7890
Copy link
Contributor

@karuppiah7890 karuppiah7890 commented May 19, 2021

What this PR does / why we need it

This is to run TCE CAPD standalone cluster automatically using a script with no manual human intervention. This is a precursor to our next step of running end to end (E2E) tests for TCE in an automated manner. So, in order to run E2E tests in an automated manner we first need a cluster up and running using TCE

Which issue(s) this PR fixes

Fixes: #582

Describe testing done for PR

  • Manually ran the script
  • Also tested it in an automated manner by running script automatically using Travis CI job

Special notes for your reviewer

Assumptions

  1. Docker Engine is running locally and accessible
  2. TCE is not installed
  3. Kubectl is not installed

Do we need to use the below cleanup commands in CI? Especially if CI is gonna kill / cleanup the machine? As that will kill the containers too and remove the containers and all other resources like networks, volumes, images
docker kill $(docker ps -q)

docker system prune --volumes - TODO: This is an interactive command and not automatic. Needs some changes

Also, above command assume that all containers are related to TCE. Or we need to use labels or the name of the containers which container cluster name prefix. I can see labels provided by the kind cluster -

$ docker inspect <container>
...
"Labels": {
                "io.x-k8s.kind.cluster": "guest-cluster-26185",
                "io.x-k8s.kind.role": "worker"
            }
...

Does this PR introduce a user-facing change?

NONE

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 19, 2021

I think @rajaskakodkar has some code which does the checks for docker and TCE installation and also does make release. So if that gets merged in a PR, I'll just use that. Or I'll copy paste if this PR is going to get merged first.

@karuppiah7890 karuppiah7890 force-pushed the tce-capd-standalone-automation branch 6 times, most recently from ec01505 to 55e9324 Compare May 19, 2021 18:24
@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 19, 2021

I also had to fix some issues relating to the script code not supporting placement of the script file and running from anywhere. Now it's all fixed I believe, and I'm trying out a Travis CI job now to ensure that it's all working

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 19, 2021

I'm also providing access the fork private repo so that you can view the Travis CI job log to see how it looks like and if it's all working good

I'm giving access to this repo - https://github.com/karuppiah7890/tce to @dvonthenen @stmcginnis @jpmcb @joshrosso @rajaskakodkar

Travis CI jobs for the repo are here - https://travis-ci.com/github/karuppiah7890/tce/

Let me know if you can't view it

@davidvonthenen
Copy link
Contributor

It looks like this PR #590 is storing stuff in the hack/e2e-tests dir and this one is in hack/test-automation. Seems like we should consolidate the directories unless they do different things.

@davidvonthenen
Copy link
Contributor

Since you are opening the PR from a fork, can you please run make check to verify linting.

@davidvonthenen
Copy link
Contributor

another thought... since this really isnt a "hack" maybe this should actually go into a folder off the root of the repo like in test. @joshrosso

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 20, 2021

You are right about the difference in directory naming @dvonthenen . I used the current naming in this PR based on the suggestion here. Me and @rajaskakodkar figured that whichever PR gets merged first with an agreed upon naming, we can change the other PR's directory naming to the agree upon name

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 20, 2021

@dvonthenen test sounds good too. I assumed that - anything that "looks hacky" or uses bash scripts are going into hack. But I can move it to the project root with a name like test or test-automation

@karuppiah7890
Copy link
Contributor Author

I had already run make check and fixed some shellcheck errors. Now the make check results in no errors 👍

You can click here to see "make check" logs!
karuppiahn-a01:tce karuppiahn$ make check
make -C hack/tools golangci-lint
make[1]: Nothing to be done for `golangci-lint'.
hack/tools/bin/golangci-lint run -v --timeout=5m
INFO [config_reader] Config search paths: [./ /Users/karuppiahn/projects/github.com/vmware-tanzu/tce /Users/karuppiahn/projects/github.com/vmware-tanzu /Users/karuppiahn/projects/github.com /Users/karuppiahn/projects /Users/karuppiahn /Users /] 
INFO [config_reader] Used config file .golangci.yaml 
INFO [lintersdb] Active 35 linters: [bodyclose deadcode depguard dogsled dupl errcheck funlen goconst gocritic gocyclo gofmt goheader goimports golint gomnd goprintffuncname gosec gosimple govet ineffassign maligned misspell nakedret noctx nolintlint rowserrcheck staticcheck structcheck stylecheck typecheck unconvert unparam unused varcheck whitespace] 
INFO [loader] Go packages loading at mode 575 (files|imports|types_sizes|compiled_files|deps|exports_file|name) took 4.637836453s 
INFO [runner/filename_unadjuster] Pre-built 0 adjustments in 2.434325ms 
INFO [linters context/goanalysis] analyzers took 0s with no stages 
INFO [linters context/goanalysis] analyzers took 0s with no stages 
INFO [runner] Issues before processing: 89, after processing: 0 
INFO [runner] Processors filtering stat (out/in): autogenerated_exclude: 89/89, nolint: 0/4, cgo: 89/89, filename_unadjuster: 89/89, exclude: 89/89, path_prettifier: 89/89, skip_dirs: 89/89, exclude-rules: 4/89, skip_files: 89/89, identifier_marker: 89/89 
INFO [runner] processing took 7.736264ms with stages: autogenerated_exclude: 2.159663ms, path_prettifier: 1.669338ms, identifier_marker: 1.2678ms, exclude-rules: 1.202384ms, nolint: 793.129µs, exclude: 405.195µs, skip_dirs: 220.42µs, cgo: 9.174µs, filename_unadjuster: 5.957µs, max_same_issues: 730ns, uniq_by_line: 370ns, diff: 334ns, max_per_file_from_linter: 312ns, max_from_linter: 307ns, skip_files: 306ns, source_code: 238ns, severity-rules: 201ns, path_shortener: 161ns, sort_results: 137ns, path_prefixer: 108ns 
INFO [runner] linters took 1.555971553s with stages: goanalysis_metalinter: 1.547434712s, unused: 721.966µs 
INFO File cache stats: 0 entries of total size 0B 
INFO Memory: 64 samples, avg is 90.6MB, max is 140.6MB 
INFO Execution took 6.208465611s                  
hack/check-mdlint.sh
hack/check-shell.sh
ShellCheck - shell script analysis tool
version: 0.7.2
license: GNU General Public License, version 3
website: https://www.shellcheck.net
karuppiahn-a01:tce karuppiahn$ echo $?
0
karuppiahn-a01:tce karuppiahn$ 

@karuppiah7890
Copy link
Contributor Author

I'm also noticing from Travis CI config warnings and some info on defaults, I'll fix that too

root: deprecated key sudo (The key `sudo` has no effect anymore.)
root: missing dist, using the default xenial
root: missing os, using the default linux
root: missing language, using the default ruby 

@karuppiah7890
Copy link
Contributor Author

Regarding GitHub token - I'm already assuming that the GitHub token environment variable is injected through Travis CI settings and not .travis.yml, but we can add a check in the script

Also, when do we want to run this script? And how do we want to get TCE? As of now I see that for nightly jobs we want to run make release. Do we use make release always? Or do we use GitHub release binaries in some cases, like stable release? I can't think of any other use case for using GitHub release binaries other than stable release time - but even for stable release, we need to run the CI/CD with E2E tests before we do the release, in which case the release binaries will not be available. We can get unsigned binaries from draft release if that's there, but still, ideally the release process itself should stop with no signed binaries, if E2E tests fail. So, I guess we can use make release always?

@karuppiah7890 karuppiah7890 force-pushed the tce-capd-standalone-automation branch from 55e9324 to 4c9a2d5 Compare May 21, 2021 05:25
@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 21, 2021

PR Review Comments and Status

  • Add License Headers to the shell scripts file ✅
  • Move all the scripts into a directory called test-automation
  • Place the directory test-automation at the root directory of the project ✅
  • Rename install-tce.sh to something like get-tce.sh or similar, as it seems to have conflicting meaning with install.sh
  • Run make check and ensure there are no errors ✅
  • Run make release (code from E2E automation for TCE in AWS #590 takes care of this)

Additional Tasks

  • Rebase with main branch before merge happens
    • As of now rebased with the latest main (8871cf8) - as of May 26th 2021

Open questions

  • When / at what instances do we want to run this CI/CD job to run? Every commit? Nightly?
  • Follow up to first question, how do we get a TCE binary in different instances? Should we use make release for all instances? For now we just use make release

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 21, 2021

I have put the review comments and status of the work for the review comment. And also open questions. I think only one task is remaining for me - regarding the make release

Please let me know if you folks have any other review comments :) @dvonthenen @stmcginnis @jpmcb

Also please do let me know who (all) would need approve this PR and who would merge it so that I can ping them here

Copy link
Contributor

@davidvonthenen davidvonthenen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to investigate the github actions before merging? Or did we want to merge and iterate?

@karuppiah7890
Copy link
Contributor Author

karuppiah7890 commented May 25, 2021

@dvonthenen we can use GitHub Actions I think. As #590 (AWS) takes time and Travis CI doesn't support large job timeouts, we can already do that change. I think we can merge #590 first, along with GitHub Actions, and then come back to this PR. I'm looking forward to getting these two PRs merged so that we can move forward with other things

cc @rajaskakodkar

@davidvonthenen
Copy link
Contributor

I commented on #590 (comment)

@karuppiah7890
Copy link
Contributor Author

I have created a separate issue #609 regarding usage of GitHub Actions for E2E test automation

@karuppiah7890 karuppiah7890 force-pushed the tce-capd-standalone-automation branch from 4c9a2d5 to 54ea8c5 Compare May 26, 2021 08:22
Copy link
Contributor

@davidvonthenen davidvonthenen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@davidvonthenen davidvonthenen merged commit a5ab864 into vmware-tanzu:main May 26, 2021
@karuppiah7890 karuppiah7890 deleted the tce-capd-standalone-automation branch May 27, 2021 05:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Script to automatically run TCE CAPD Standalone cluster
4 participants