Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory hog #937

Closed
ohmer opened this issue Jun 2, 2022 · 8 comments
Closed

Memory hog #937

ohmer opened this issue Jun 2, 2022 · 8 comments
Labels
bug Something isn't working waiting-response

Comments

@ohmer
Copy link

ohmer commented Jun 2, 2022

Server Version

0.27.0

Terraform Version

Terraform v1.1.3
on linux_amd64

Client Version

VScode remote (hashicorp.terraform-2.22.0-linux-x64)

Terraform Configuration Files

Can't do but don't think it is relevant.
Point to note is that I am using git and and symbolic links inside my repository.

Log Output

There are log outputs that I can see. The behavior is a memory hog, crashing or freezing my work machine for long minutes.

Expected Behavior

No machine crash or freeze.

Actual Behavior

In my setup terraform-ls runs on an AWS instance. I use VScode remote from my MacOS workstation to a EC2 Linux instance in AWS. The is because I work in a country far away from my cloud resource, that is ~150 - 200ms making terraform runs very slow). The EC2 instance is a t3a.medium (4GB of RAM).

A terraform plan while having VScode running leads to terraform-ls consuming lots of memory, leading to swap.
Sometime crashing my instance (have to force reboot in the console) or making it unresponsive for 5-10 minutes.

Note that we are using a single terraform repository (tree -d => 224 directories), terraform-ls has lots to inspect.

I managed to run top while the instance is still responsive and it shows the memory hog:

image

Not shown on the picture but kswapd is also very busy, shell is unresponsive, can't start a new SSH connection, VSCode remote stalls...

Other environment information, terraform-ls runs on:

$ uname -a
Linux i-01b1d1dd493818162 5.13.0-1025-aws #27~20.04.1-Ubuntu SMP Thu May 19 15:17:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"

Steps to Reproduce

I am not sure it can be reproduced. It might be related to the size of our repository.
There are no other DevOps in my organization who could clone our private repository.

But, if this is not related to the size of the repository, I guess a similar setup to mine could work.
Happy to provide an EC2 instance to do so.

@jpogran
Copy link
Contributor

jpogran commented Jun 2, 2022

I'm sorry you're experiencing this @ohmer. To confirm, your Extensions Pane looks something like below? Replace the WSL: Ubuntu-20.04 - INSTALLED with the Remote SSH Extension verbage and it should show the Terraform Extension in the second section too.

image

@ohmer
Copy link
Author

ohmer commented Jun 2, 2022

Hey @jpogran, thanks for looking at this report!

Yes, this is what my setup looks like

image

@radeksimko
Copy link
Member

Hi @ohmer
Thanks for providing the screenshot.

We publish some expectations around memory usage based on our benchmarks where the baseline is basically 300MB (*). 3.2G of residential memory from your screenshot does look like a lot and is quite far away from any of these figures.

Note that we are using a single terraform repository (tree -d => 224 directories), terraform-ls has lots to inspect.

This is useful to know, but number of directories on its own should not cause this. The question is how many directories of these 224 are actually indexed, how big are these modules and what providers do they use.

A few more questions:

  1. It is unlikely that this is the cause, given the scope of the other extension but can you try to reproduce this with EditorConfig for VS Code extension disabled?
  2. Can you check how many modules in your tree are initialized?
    • You can run find . -type d -name '.terraform' | wc -l in the root of the repo
  3. Can you check what is the approximate size (in bytes) of all *.tf and *.tfvars files within each initialized module + size of each .terraform folder?
    • find . -type d -name '.terraform' | xargs -I{} du -h -s {}
    • find . -type d -name '.terraform' | xargs -I{} sh -c 'du -ch {}/../*.tf* | grep total'
  4. Can you try to enable memory profiling and share a memory profile with us? See more at https://github.com/hashicorp/terraform-ls/blob/main/docs/TROUBLESHOOTING.md#memory-profiling
    • The additional -memprofile flag can be provided via JSON settings under "terraform.languageServer" -> "args" in VS Code.
  5. Can you provide us with the whole log? That could tell us how many modules are being indexed or what other things may be responsible for the memory usage. You can either copy it from the Output panel or send it into a file via -log-file flag.

Regarding (4) and (5) - you can provide these as gists or email them to us (radek <at> hashicorp.com). The memory profile in particular is unlikely to have sensitive data in it, but if you wish you can encrypt either file with my PGP key.

I understand it's a lot to ask ^, but getting even a few questions answered from the list would still help, getting them all answered would help enourmously.


* - The only way we might be able to get it significantly lower would be to avoid embedding some provider schemas in the binary and have those dynamically obtained at runtime. We track some of that work under #193 but it's not on the near-term roadmap.

@radeksimko radeksimko added bug Something isn't working waiting-response labels Jun 6, 2022
@ohmer
Copy link
Author

ohmer commented Jun 8, 2022

Hi @radeksimko,

Thanks a lot for looking into this! On your questions:

  1. Sure, disabling EditorConfig and let you know if this happens again. As mentionned, it does not occur all the time. My suspicion is that it happens when VSCode remote reconnects after long disconnection. Typically, at the end of the day, I put my laptop in hibernation while everything is running. This tends to occur the next day, when reconnecting.

  2. find . -type d -name '.terraform' | wc -l => 81

  3. Might be useful to mention that I have the following:

cat ~/.terraformrc
plugin_cache_dir   = "$HOME/.terraform.d/plugin-cache"
disable_checkpoint = true

~/.terraform.d/plugin-cache is populated with 3GB.

3.1 find . -type d -name '.terraform' | xargs -I{} du -h -s {}

712K	./stacks/investapp/edge/us-west-1/database/.terraform
8.0K	./stacks/investapp/edge/us-west-1/feed-engine/.terraform
8.0K	./stacks/investapp/edge/us-west-1/ecs-tasks/.terraform
5.2M	./stacks/investapp/edge/us-west-1/load-balancer/.terraform
8.0K	./stacks/investapp/edge/us-east-1/s3-antivirus/.terraform
8.0K	./stacks/investapp/staging2/us-west-1/ecs-tasks/.terraform
4.2M	./stacks/investapp/staging2/us-west-1/ecs-services/.terraform
712K	./stacks/investapp/staging/us-west-1/database/.terraform
8.0K	./stacks/investapp/staging/us-west-1/ecs-tasks/.terraform
4.2M	./stacks/investapp/staging/us-west-1/ecs-services/.terraform
8.0K	./stacks/investapp/staging3/us-west-1/ecs-tasks/.terraform
4.2M	./stacks/investapp/staging3/us-west-1/ecs-services/.terraform
712K	./stacks/investapp/production/us-west-1/database/.terraform
8.0K	./stacks/investapp/production/us-west-1/database-mysqldump/.terraform
8.0K	./stacks/investapp/production/us-west-1/feed-engine/.terraform
8.0K	./stacks/investapp/production/us-west-1/ecs-tasks/.terraform
4.2M	./stacks/investapp/production/us-west-1/ecs-services/.terraform
8.0K	./stacks/investapp/production/us-east-1/mobile/.terraform
8.0K	./stacks/investapp/production/us-east-1/s3-antivirus/.terraform
712K	./stacks/investapp/testing/us-west-1/database/.terraform
8.0K	./stacks/investapp/testing/us-west-1/database-mysqldump/.terraform
8.0K	./stacks/investapp/testing/us-west-1/feed-engine/.terraform
8.0K	./stacks/investapp/testing/us-west-1/ecs-tasks/.terraform
4.2M	./stacks/investapp/testing/us-west-1/ecs-services/.terraform
3.6M	./stacks/investapp/testing/us-west-1/elasticache/.terraform
8.0K	./stacks/investapp/testing/us-east-1/s3-antivirus/.terraform
8.0K	./stacks/investapp/common/ap-southeast-2/http-proxy/.terraform
8.0K	./stacks/investapp/common/ap-southeast-2/aws-operator/.terraform
4.0K	./stacks/investapp/common/us-west-1/ssm/.terraform
4.0K	./stacks/investapp/common/us-west-2/ses/.terraform
4.0K	./stacks/infrastructure/datadog/aws-vpc/.terraform
4.0K	./stacks/infrastructure/cloudflare/dns/.terraform
8.0K	./stacks/infrastructure/aws/edge/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/edge/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/audit/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/audit/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/log-archive/production/ap-southeast-2/baseline/.terraform
4.0K	./stacks/infrastructure/aws/log-archive/production/us-west-1/elb/.terraform
8.0K	./stacks/infrastructure/aws/log-archive/production/us-west-1/baseline/.terraform
4.0K	./stacks/infrastructure/aws/log-archive/production/us-west-1/datadog-forwarder-lambda/.terraform
4.0K	./stacks/infrastructure/aws/log-archive/production/us-east-1/cloudtrail/.terraform
4.0K	./stacks/infrastructure/aws/log-archive/production/us-east-1/cloudfront/.terraform
4.0K	./stacks/infrastructure/aws/log-archive/production/us-east-1/vpc/.terraform
8.0K	./stacks/infrastructure/aws/log-archive/production/us-east-1/baseline/.terraform
4.0K	./stacks/infrastructure/aws/log-archive/production/us-east-1/datadog-forwarder-lambda/.terraform
8.0K	./stacks/infrastructure/aws/staging/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/staging/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/releng/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/releng/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/backup/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/shared-services/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/shared-services/production/ap-southeast-2/network/.terraform
4.0K	./stacks/infrastructure/aws/shared-services/production/ap-southeast-2/prefix-lists/.terraform
4.0K	./stacks/infrastructure/aws/shared-services/production/ap-southeast-2/terraform-runner/.terraform
8.0K	./stacks/infrastructure/aws/shared-services/production/us-east-1/baseline/.terraform
4.0K	./stacks/infrastructure/aws/shared-services/production/us-west-1/prefix-lists/.terraform
8.0K	./stacks/infrastructure/aws/production/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/production/production/us-west-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/production/production/us-east-1/baseline/.terraform
4.0K	./stacks/infrastructure/aws/production/production/us-east-1/domains/.terraform
4.0K	./stacks/infrastructure/aws/production/common/us-west-1/security-groups/.terraform
8.0K	./stacks/infrastructure/aws/production/common/us-west-1/network/.terraform
8.0K	./stacks/infrastructure/aws/static-sites/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/static-sites/production/us-west-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/static-sites/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/testing/production/ap-southeast-2/baseline/.terraform
8.0K	./stacks/infrastructure/aws/testing/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/main/production/ap-southeast-2/terraform-backend/.terraform
4.0K	./stacks/infrastructure/aws/main/production/ap-southeast-2/iam/.terraform
8.0K	./stacks/infrastructure/aws/main/production/ap-southeast-2/baseline/.terraform
4.0K	./stacks/infrastructure/aws/main/production/ap-southeast-2/sso/.terraform
8.0K	./stacks/infrastructure/aws/main/production/ap-southeast-2/network/.terraform
4.0K	./stacks/infrastructure/aws/main/production/ap-southeast-2/organization/.terraform
8.0K	./stacks/infrastructure/aws/main/production/us-east-1/baseline/.terraform
8.0K	./stacks/infrastructure/aws/sandbox/production/ap-southeast-2/baseline/.terraform
2.1M	./stacks/static-sites/help/staging/us-east-1/cloudfront/.terraform
2.1M	./stacks/static-sites/help/production/us-east-1/cloudfront/.terraform
2.1M	./stacks/static-sites/www/staging/us-east-1/cloudfront/.terraform
2.1M	./stacks/static-sites/www/production/us-east-1/cloudfront/.terraform
8.0K	./stacks/static-sites/www/production/us-east-1/redirects/.terraform
8.0K	./stacks/static-sites/www/common/us-east-1/gatsby/.terraform

3.2 find . -type d -name '.terraform' | xargs -I{} sh -c 'du -ch {}/../*.tf* | grep total'

8.0K	total
4.0K	total
4.0K	total
du: cannot access './stacks/investapp/edge/us-west-1/load-balancer/.terraform/../*.tf*': No such file or directory
0	total
8.0K	total
4.0K	total
8.0K	total
8.0K	total
4.0K	total
8.0K	total
4.0K	total
8.0K	total
8.0K	total
8.0K	total
4.0K	total
4.0K	total
8.0K	total
8.0K	total
8.0K	total
8.0K	total
8.0K	total
4.0K	total
4.0K	total
8.0K	total
8.0K	total
8.0K	total
16K	total
12K	total
20K	total
8.0K	total
12K	total
12K	total
4.0K	total
4.0K	total
4.0K	total
4.0K	total
4.0K	total
8.0K	total
4.0K	total
8.0K	total
8.0K	total
8.0K	total
8.0K	total
4.0K	total
12K	total
4.0K	total
4.0K	total
4.0K	total
4.0K	total
8.0K	total
4.0K	total
8.0K	total
4.0K	total
8.0K	total
4.0K	total
4.0K	total
8.0K	total
4.0K	total
4.0K	total
20K	total
20K	total
12K	total
4.0K	total
4.0K	total
4.0K	total
4.0K	total
4.0K	total
12K	total
20K	total
4.0K	total
20K	total
8.0K	total
8.0K	total
4.0K	total
4.0K	total
8.0K	total
8.0K	total
8.0K	total
8.0K	total
8.0K	total
12K	total

4 + 5. Sent via email.

VSCode configuration applied before restarting extension:

  "terraform-ls.terraformLogFilePath": "/tmp/terraform-ls-memprofile-{{timestamp}}.log",
  "terraform.languageServer": {
    "external": true,
    "args": [
      "serve",
      "-memprofile=/tmp/terraform-ls-memprofile-{{timestamp}}.prof"
    ]
  }

I will leave these settings as is, hoping that a memory hog will happen again and it would provide more data.
What I sent via email is unfortunately not under memory exhaustion.

Thanks again for looking into this and please don't hesitate to ask for more info if that helps. That's how this could be tracked down.

@radeksimko
Copy link
Member

@ohmer This data is all useful, thank you for providing it!

81 initialized modules is not a small number, but it's certainly a number that LS should be able to handle easily, also module configuration hovering between 4-12KB should not be a challenge.

What I sent via email is unfortunately not under memory exhaustion.

I briefly looked into the one you sent but I didn't see anything out of ordinary - or it appeared to have been captured quite early in the LS lifecycle before most of any memory-heavy operations had chance to fill up the memory. It would be indeed more helpful to have one from when the memory usage is high.

From the names of file paths it appears the vast majority of modules use the AWS provider primarily. That is the provider with biggest schema, and I guess that obtaining the schema (running terraform providers schema -json) could temporarily hike the memory usage as the JSON blob is being processed, but if these AWS providers are all of the same version, then this shouldn't be a problem as we shouldn't even be requesting schema for the exact same provider we already cached schema for.

I mentioned that in my email response, but I'll mention it again here, just for visibility. It looks like the logs you provided are logs for Terraform CLI execution (typically used to obtain the schema or get Terraform version, or to format code), not for the server itself. I have updated our troubleshooting docs - hopefully the difference between the two is now clearer.

-log-file is the flag to use for directing server logs to a file.

@ohmer
Copy link
Author

ohmer commented Jun 14, 2022

@radeksimko since I sent this data and received your feedback, I upgraded the VSCode extension and configuration. Now running ~/.vscode-server/extensions/hashicorp.terraform-2.23.0-linux-x64/bin/terraform-ls --version => 0.28.1 and configuration is as below:

  "terraform.languageServer": {
    "external": true,
    "args": [
      "serve",
      "-log-file=/tmp/terraform-ls-{{timestamp}}.log",
      "-memprofile=/tmp/terraform-ls-memprofile-{{timestamp}}.prof"
    ]
  }

I haven't had a memory hog since last report. If that happens in this new version + configuration, I will share hopefully more valuable data.

@radeksimko
Copy link
Member

I'm going to (optimistically) close this issue as there's nothing we can act on at this point.

However, please do let us know if the issue re-appears by opening a new one and attaching the details as discussed - we'd be happy to look into it.

@radeksimko radeksimko closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working waiting-response
Projects
None yet
Development

No branches or pull requests

3 participants