Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new workloads #25106

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Add new workloads #25106

wants to merge 6 commits into from

Conversation

Juanadelacuesta
Copy link
Member

@Juanadelacuesta Juanadelacuesta commented Feb 12, 2025

Description

This PR adds new system and batch workloads to the upgrade tests.

Testing & Reproduction steps

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@Juanadelacuesta Juanadelacuesta requested review from a team as code owners February 12, 2025 21:47
@Juanadelacuesta Juanadelacuesta marked this pull request as draft February 12, 2025 21:47
Base automatically changed from f-NET-11478-enos-versions to main February 13, 2025 09:32
@@ -103,6 +103,12 @@ variable "aws_kms_alias" {
# provide a list of builds to override the values of nomad_sha, nomad_version,
# or nomad_local_binary. Most of the time you can ignore these variables!

variable "nomad_local_binary_server_unique" {
description = "A nomad local binary paths to deploy to servers, to override nomad_local_binary"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this variable is supposed to do (what's "unique" mean here?), and the description matches nomad_local_binary_server exactly. It seems like we're overriding nomad_local_binary_server but I can't tell why. Shouldn't Enos set nomad_local_binary_server to make sure we're using the Linux binaries for the server instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because nomad_local_binary_server is an array and we would need to pass the same path 3 times, but enos does not accept functions, so it would have to come from the previous step as a list or we would have to add an extra module to build the list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok... honestly I'm not sure that array has ever been used to have different values, so I wonder if it would be worth refactoring to just have a single override. But we can do that in a separate PR if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im guessing at some point someone wanted to test compatibility of different versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think I wrote that? 😊 But we really got away from running E2E manually, so there's probably a bunch of tiny design decisions that could be improved there.

@Juanadelacuesta Juanadelacuesta changed the title f-NET-11478-enos-windows Add new workloads Feb 20, 2025
workloads = {
service_raw_exec = { job_spec = "jobs/raw-exec-service.nomad.hcl", alloc_count = 3, type = "service" }
service_docker = { job_spec = "jobs/docker-service.nomad.hcl", alloc_count = 3, type = "service" }
system_docker = { job_spec = "jobs/docker-system.nomad.hcl", alloc_count = 0, type = "system" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alloc count for system jobs doesn't really matter, its set to 0 to emphasis it

Comment on lines 24 to 27
output "new_allocs_count" {
description = "The number of allocs that should be running in the cluster"
value = local.system_job_count * chomp(enos_local_exec.get_nodes.stdout) + local.service_batch_allocs
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this output for? It doesn't match allocs_count

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im trying to make this module aware of existing allocs so it outputs all the running allocs, new and old, and the output can be directly used for a next step in enos, because it does not accept functions as step.variables, so this output helps me debug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Let's update the descriptions to make that clear. Right now this is the same as allocs_count.

Comment on lines +68 to +69
nomad_local_binary = step.copy_initial_binary.binary_path[matrix.os]
nomad_local_binary_server = step.copy_initial_binary.binary_path[local.server_os]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rebase this PR on #25172 once that's been merged.

config {
image = "alpine:latest"
command = "sh"
args = ["-c", "while true; do sleep 30000; done"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch workloads are a little tricky -- we probably want to make sure that they can complete and not get rescheduled by client/server restarts, rather than having them wait forever. But that'll require some new assertion logic, so let's come back to that.

Comment on lines +17 to +19
image = "alpine:latest"
command = "sh"
args = ["-c", "while true; do sleep 30000; done"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For service/system jobs, we should probably use a workload where we can assert more about it than "is the alloc running?". busybox httpd would let us run a network service, so that we're exercising things like restoring CNI. Ok to leave for this PR, but let's come back to this too.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants