-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] Using gls library to provide context rather than attaching it to errors #4293
Conversation
git-subtree-dir: src/github.com/getlantern/gls git-subtree-split: 2b8af7a308eb752fc4030f93e85a8da641d2006e
…b.com/getlantern/gls'
git-subtree-dir: src/github.com/tylerb/is git-subtree-split: 6c1a46754cf86b12fffbe5ecaadba8c382059165
git-subtree-dir: src/github.com/oxtoacart/bpool git-subtree-split: 4e1c5567d7c2dd59fa4c7c83d34c2f3528b025d6
…b.com/oxtoacart/bpool'
"user_agent": userAgent, | ||
"request_id": rand.Int63(), | ||
"origin": req.Host, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this call, anything that we do in processing this request is tagged with these 4 pieces of context information.
Closing in lieu of #4295. |
I agree, they are definitely complementary.
Sorry, I don't quite understand. The structured logging here just means that they actually log JSON rather than a string, right? We're essentially talking about doing the same in terms of submitting our errors as measurements, right? My point was about something else, namely that to ensure that the information attached to our error propagates in a typed form, we have to make sure it's handled properly along the entire path between whoever is logging it and whoever raised it. |
I don't understand how passing the context along is any better. Either way, the intermediary function that's spawning the goroutine has to know to pass this state. Simply using context.Go() by convention actually seems easier, because I don't have to modify my method to accept a Context. For example: func doStuffInGoroutine(myarg string, myotherarg int) {
context.Go(func() {
doStuff(myarg, myotherarg)
})
} vs. func doStuffInGoroutine(myarg string, myotherarg int, ctx *Context) {
go func() {
doStuff(myarg, myotherarg, ctx)
}
} Either way we're aware of the need to pass context, but the 2nd one has a more verbose syntax.
I don't think the user-agent is noise. Perhaps some user agents behave in a way that causes the EOF (for example the user agent disconnects early). That's something that we'd want to see in our Influx data. I think it was actually you who convinced me that we should collect as much information as is reasonably possible into the raw Influx data to see what we can learn from it. |
BTW, getting back to the point about attaching metadata to errors being complementary to passing it down via context, I think what would be really cool is if the errors package integrated with context so that a call to |
I agree, the fmt.Errorf() style is something I regret doing. It was convenient at first, but it loses too much information. I think this points out something important, namely that we have two types of data related to errors and logging.
I'm thinking that the best approach is a combination where: a. We populate context before errors even happen so that we have that available both for errors and for debug logging b. At the point that an error occurs, we switch to using structured errors that capture information about the error in a machine-readable format (e.g. callsite information, nested error preserved in its original form, etc.) In a way, Java's exceptions are a good example here in the way that they contain typed fields and also nested causes. |
My view is that intermediary layers don't know when context is or is not necessary, so they have to assume it always is. For example, something like this: conn, err := idletiming.Wrap(dialFN) Taking our real context as an example, with user-agent, request id, origin host, etc., the idletiming.Wrap function may not care about any of this because it doesn't even log anything or return errors, however the dialFN may care because it does. The only solution is to pass the context through all the time, so now we have: conn, err := idletiming.Wrap(ctx, dialFN) And dialFN has to take a context as well. And so on. That's a lot of functions that need to change, and it clutters the code. With my context implementation, most of the time nothing is required. If idletiming happens to open a goroutine (which we do relatively rarely on the critical path), then it just switches to context.Go, which is quite easy and an isolated change. |
BTW, this branch has a bug that causes context to leak, which does pollute the log. That's fixed in #4295. |
cc: @uaalto
Our new errors package attaches contextual information (e.g. user-agent, proxy ip, origin host, etc.) to errors so that we can report that contextual information in our logs and to Borda. After playing with the library, I found myself missing a few things:
A
->B
->C
.C
raises an error, wrapping it with our errors package to attach some useful metadata. Unfortunately, packageB
treats it like a regular error and actually just prepends some text to it before passing it toA
(or worse just logs the error and doesn't raise it on).A
now has no access to the context information fromC
. If we own packageB
, this is extra work, but if we don't own packageB
there's no solution unless we want to fork the code.I had earlier suggested that we look into the tylerb/gls library for capturing contextual information in the context of our goroutines (kind of like a Java ThreadLocal). I went ahead and prototyped something using that approach, and I have to say that I like it better. It completely solves the first three of the above problems and mostly solves the 4th. It doesn't completely solve the 4th problem because if package
B
is creating a new goroutine, we have to do some special work in there to keep the context. Especially for things like HTTP request processing, raising new goroutines is rare, so I'm less concerned about this than errors not getting wrapped.The context-based approach also seems less invasive, since we only need to add code where we have new contextual information that we want to add and all errors logged within those contexts automatically benefit from the information.
Here is an excerpt from my logs that shows:
balancer
shows contextual information likeuser_agent
andrequest_id
that those packages have no knowledge about. In fact, in this example,balancer
doesn't itself know any of the logged contextual information.