Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logger goroutine-safety: fatal error: concurrent map writes #126

Closed
ewbankkit opened this issue Feb 3, 2023 · 3 comments
Closed

Logger goroutine-safety: fatal error: concurrent map writes #126

ewbankkit opened this issue Feb 3, 2023 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ewbankkit
Copy link

ewbankkit commented Feb 3, 2023

terraform-plugin-log version

% go list -m github.com/hashicorp/terraform-plugin-log/...
github.com/hashicorp/terraform-plugin-log v0.7.0

Relates hashicorp/terraform-provider-aws#29236.

With v4.53.0, released 02/02/2023, the Terraform AWS Provider has started to use terraform-plugin-log (via aws-sdk-go-base) to log AWS SDK for Go requests and responses.
Under heavy usage we are seeing crashes:

fatal error: concurrent map writes
goroutine 1730 [running]:
github.com/hashicorp/terraform-plugin-log/tflog.SetField.func1(...)
	github.com/hashicorp/terraform-plugin-log@v0.7.0/internal/logging/options.go:212
github.com/hashicorp/terraform-plugin-log/tflog.SetField({0xe25bd00, 0xc001d5f890}, {0xce4900e, 0xb}, {0xa619820?, 0xc00089eb90})
	github.com/hashicorp/terraform-plugin-log@v0.7.0/tflog/provider.go:18 +0x111
github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2.setAWSFields({0xe25bd00?, 0xc003bc12f0?}, 0xc005cbaa00)
	github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.23/logger.go:40 +0x9c
github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2.logRequest(0xc005cbaa00)
	github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.23/logger.go:69 +0x5f
...

func WithField(key string, value interface{}) Option {
return func(l LoggerOpts) LoggerOpts {
// Lazily create this map, on first assignment
if l.Fields == nil {
l.Fields = make(map[string]interface{})
}
l.Fields[key] = value
return l
}
}

func SetField(ctx context.Context, key string, value interface{}) context.Context {
lOpts := logging.GetProviderRootTFLoggerOpts(ctx)
lOpts = logging.WithField(key, value)(lOpts)
return logging.SetProviderRootTFLoggerOpts(ctx, lOpts)
}

https://github.com/hashicorp/aws-sdk-go-base/blob/f0b32e1671652e7f19ebd715676fceb7c37190f2/v2/awsv1shim/logger.go#L38-L51

It is not clear which component is responsible for ensuring goroutine-safety here.

@bflad
Copy link
Contributor

bflad commented Feb 3, 2023

Hey @ewbankkit 👋 Thank you for reporting this and sorry you are running into trouble here.

I think the quick version of this issue is that tflog.SetField() on the same context.Context should be treated like any other "global" variable in the calling codebase. Under the hood here, it is a single shared mapping of field names to values which eventually gets stored in a context.Context value containing the logger with that mapping. Concurrent usage with differing fields almost certainly will have data race conditions leading to incorrect fields being associated with certain messages or Go raising this type of panic.

It looks like the calling code here may be trying to log API requests from the provider. There are a few options I can think of here which should alleviate the immediate issue:

func F1(ctx context.Context) {
  reqCtx := tflog.SetField(ctx, /*...*/)
  reqCtx = tflog.SetField(reqCtx, /*...*/)
  tflog.Trace(reqCtx, /*...*/)
  F2(reqCtx)
}
  • Creating separate logger subsystems per request: https://developer.hashicorp.com/terraform/plugin/log/writing#subsystems (This is similar to the above, but a more drastic implementation. I think it has some downsides though as you would need to make the subsystem name have a unique identifier per request to prevent this same collision issue and it will continue growing the shared context.Context in size until the end of the Terraform request)
  • Use inline fields (likely not applicable to this particular situation because it would preclude those fields being passed along in downstream functions using that context.Context, but mentioned for future reader benefit in case it might be desirable in other situations)
func xxx(ctx context.Context, /*...*/) {
  fields := map[string]any{/*...*/}
  tflog.Trace(ctx, "...", fields)
}

One definite action item on our side is to ensure this "shared variable" analogy is well documented in the Fields section. Another would be whether to consider this particular Go panic as something we should actually fix in this codebase.

It would be technically possible to prevent the concurrent map writes Go panic by the SetField() logic implementing sync.Map or sync.Mutex based logic. However, this has a major downside that it would only serve to mask the caller concurrency issue that would lead to various log entries having incorrect fields associated with them. An obtuse/confusing panic is quite an unfortunate way to surface this issue to callers though as it likely punishes Terraform practitioners later on in high concurrency situations rather than developers more upfront before it is released.

We could introduce logic that checks to see if the SetField is attempting to overwrite a previously set field. That logic could use the same context.Context to tfsdklog out about the potential concurrency issue. I think it would still then be debatable then whether a sync based solution would make sense, since panics are scary, but the caller at least still gets some notification about the issue. Some folks may also be doing SetField intentionally to overwrite fields, so that additional logging may be invalid. Maybe the logger or function could have the option whether to do that or not, or worst case a separate function created.

I think if I had my preference in this situation, it would be a combination of checking the field, (WARN?) logging on overwrite, implementing sync.Mutex to protect the map and prevent panics, then waiting to see if anyone complains about the new logging before worrying about making it optional.

Please reach out if you have any questions.

bflad added a commit that referenced this issue Feb 7, 2023
Reference: #126
Reference: #131

Since the `LoggerOpts` struct contains slice and map fields, it is important to ensure any modifications occur on copies of those slices and maps, otherwise the memory reference can wind up being shared. Consumers should always be able to create a new `context.Context` without worrying about shared data.

This change introduces a `Copy()` method for `LoggerOpts` and implements it for option modifier functions which adjust a map or slice.
bflad added a commit that referenced this issue Feb 8, 2023
Reference: #126
Reference: #131

Since the `LoggerOpts` struct contains slice and map fields, it is important to ensure any modifications occur on copies of those slices and maps, otherwise the memory reference can wind up being shared. Consumers should always be able to create a new `context.Context` without worrying about shared data.

This change introduces a `Copy()` method for `LoggerOpts` and implements it for option modifier functions which adjust a map or slice.
@bflad
Copy link
Contributor

bflad commented Feb 8, 2023

After thinking about this more, I believe this issue should be resolved by #132 without additional changes or needing additional documentation. That change should prevent concurrent map writes since each new SetField receives a "new" map. Please shout if you disagree.

@bflad bflad closed this as completed Feb 8, 2023
@bflad bflad added this to the v0.8.0 milestone Feb 8, 2023
@bflad bflad self-assigned this Feb 8, 2023
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants