Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No space left on device" during heavy matrix workloads #830

Closed
PavelSusloparov opened this issue May 17, 2021 · 3 comments · Fixed by #839
Closed

"No space left on device" during heavy matrix workloads #830

PavelSusloparov opened this issue May 17, 2021 · 3 comments · Fixed by #839

Comments

@PavelSusloparov
Copy link

PavelSusloparov commented May 17, 2021

I'm using "t3.small" instance type for running my workload. I use matrix jobs and scale up to 64 spot instances.

My terraform-aws-github-runner config:

module "runners" {
  source = "git::https://github.com/philips-labs/terraform-aws-github-runner.git?ref=v0.13.0"

  aws_region = local.aws_region
  vpc_id     = data.terraform_remote_state.vpc.outputs.vpc_id
  subnet_ids = data.terraform_remote_state.vpc.outputs.private_subnets

  environment = local.environment
  tags = {
    "Project" = "GitHub Actions Self-Hosted Runners"
    Terraform = "True"
  }

  github_app = {
    key_base64     = data.sops_file.this.data["github_app_key_base64"]
    id             = data.sops_file.this.data["github_app_id"]
    client_id      = data.sops_file.this.data["github_app_client_id"]
    client_secret  = data.sops_file.this.data["github_app_client_secret"]
    webhook_secret = random_password.random.result
  }

  instance_type                     = "t3.small"
  webhook_lambda_zip                = "lambdas-download/webhook.zip"
  runner_binaries_syncer_lambda_zip = "lambdas-download/runner-binaries-syncer.zip"
  runners_lambda_zip                = "lambdas-download/runners.zip"
  enable_organization_runners       = true
  runner_extra_labels               = "terraform, autoscale"

  # enable access to the runners via SSM
  enable_ssm_on_runners = true

  manage_kms_key = false
  kms_key_id     = aws_kms_key.github.key_id

  idle_config = [
    {
      cron     = "* * * * * *"
      timeZone = "America/New_York"
      # min number of instances
      idleCount = 3
    }
  ]
  # max number of instances for scalling up
  runners_maximum_count = 63
}

There is a constant problem with "No space left on device". By default EC2 instances attach 100Gb EBS volume. Is it possible to control the size of EBS volume from the terraform deployment and what's involved?

I checked AWS SDK documentation - https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/EC2.html
there is a method "createVolume", but I do not see it during the ec2 instance creation/usage in runners.ts.

@kuvaldini
Copy link
Contributor

Hi, @PavelSusloparov
Take a look at

volume_size = lookup(block_device_mappings.value, "volume_size", 30)

@PavelSusloparov
Copy link
Author

Expanded the volume_size.
Thank you @kuvaldini !

@kuvaldini
Copy link
Contributor

@PavelSusloparov Please take a look at #839 and post your like there.

JeroenKnoops added a commit that referenced this issue May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants