[POC] Using gls library to provide context rather than attaching it to errors #4293

oxtoacart · 2016-05-15T03:05:14Z

Our new errors package attaches contextual information (e.g. user-agent, proxy ip, origin host, etc.) to errors so that we can report that contextual information in our logs and to Borda. After playing with the library, I found myself missing a few things:

Context only propagates up, not down. For example, if the caller of my function knows something about the user agent and my function knows something about the proxy address, within my function I can only log the proxy address, not the user agent. Only the caller can know both pieces of information.
Context is only available on errors. We have all this useful debug logging already in place, and it would be nice the see the context on there as well.
If multiple errors are raised within the same context, they all have to manually attach that context. For example, inside of our proxy dialer, we not only attempt to open a TLS connection, but we also verify the certificates. The proxy address is the same for both errors, and I have to manually attach it in both places.
Everyone has to wrap! Let's say that I have a call chain A -> B -> C. C raises an error, wrapping it with our errors package to attach some useful metadata. Unfortunately, package B treats it like a regular error and actually just prepends some text to it before passing it to A (or worse just logs the error and doesn't raise it on). A now has no access to the context information from C. If we own package B, this is extra work, but if we don't own package B there's no solution unless we want to fork the code.

I had earlier suggested that we look into the tylerb/gls library for capturing contextual information in the context of our goroutines (kind of like a Java ThreadLocal). I went ahead and prototyped something using that approach, and I have to say that I like it better. It completely solves the first three of the above problems and mostly solves the 4th. It doesn't completely solve the 4th problem because if package B is creating a new goroutine, we have to do some special work in there to keep the context. Especially for things like HTTP request processing, raising new goroutines is rare, so I'm less concerned about this than errors not getting wrapped.

The context-based approach also seems less invasive, since we only need to add code where we have new contextual information that we want to add and all errors logged within those contexts automatically benefit from the information.

Here is an excerpt from my logs that shows:

Both ERROR and DEBUG logging benefit
Even logging from low-level packages like balancer shows contextual information like user_agent and request_id that those packages have no knowledge about. In fact, in this example, balancer doesn't itself know any of the logged contextual information.

May 15 02:40:02.099 - 0m5s ERROR balancer: balancer.go:81 Unable to dial via (trusted) chained proxy at 45.55.130.104:443 [] to connect://www.google-analytics.com:443: Unable to dial server (trusted) chained proxy at 45.55.130.104:443 []: read tcp xxx.xxx.xxx.xxx:54174->xx.xxx.xxx.xxx:443: read: connection reset by peer on pass 2...continuing [op=proxy origin=www.google-analytics.com:443 proxy_host=45.55.130.104 proxy_port=443 proxy_protocol=https request_id=3266353486698850603 user_agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36]
May 15 02:40:02.100 - 0m5s DEBUG flashlight.client: handler.go:133 Could not dial Still unable to dial connect://www.google-analytics.com:443 after 3 attempts [op=proxy origin=www.google-analytics.com:443 proxy_host=xx.xxx.xxx.xxx proxy_port=443 proxy_protocol=https request_id=3266353486698850603 user_agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36]

git-subtree-dir: src/github.com/getlantern/gls git-subtree-split: 2b8af7a308eb752fc4030f93e85a8da641d2006e

…b.com/getlantern/gls'

git-subtree-dir: src/github.com/tylerb/is git-subtree-split: 6c1a46754cf86b12fffbe5ecaadba8c382059165

…b.com/tylerb/is'

git-subtree-dir: src/github.com/oxtoacart/bpool git-subtree-split: 4e1c5567d7c2dd59fa4c7c83d34c2f3528b025d6

…b.com/oxtoacart/bpool'

oxtoacart · 2016-05-15T03:06:12Z

src/github.com/getlantern/flashlight/client/handler.go

+		"user_agent": userAgent,
+		"request_id": rand.Int63(),
+		"origin":     req.Host,
+	})


After this call, anything that we do in processing this request is tagged with these 4 pieces of context information.

coveralls · 2016-05-15T03:21:54Z

Coverage increased (+0.2%) to 72.015% when pulling bc10f0a on contextlog into 3e2f1c5 on devel.

oxtoacart · 2016-05-15T07:33:20Z

Closing in lieu of #4295.

oxtoacart · 2016-05-15T08:00:15Z

For your point 1, context passes information top down, while Error attaches information bottom up, which is complementary to each other. Without the bottom up propagation, we have to log or report immediately when the error occurs, or the details will be lost.

I agree, they are definitely complementary.

As for your point 4, typed error is an recommendation for complex packages. Many 3rd party logging library, like logrus and log15, all use structured logging. We probably can consider evolve in that direction.

Sorry, I don't quite understand. The structured logging here just means that they actually log JSON rather than a string, right? We're essentially talking about doing the same in terms of submitting our errors as measurements, right? My point was about something else, namely that to ensure that the information attached to our error propagates in a typed form, we have to make sure it's handled properly along the entire path between whoever is logging it and whoever raised it.

oxtoacart · 2016-05-15T08:06:42Z

Goroutine - context: Not all Goroutines require the gls data, but we have to establish a convention to always call ctx.Go. With Context, we can pass the context to go func(), or create new context from the one in outer scope when necessary.

I don't understand how passing the context along is any better. Either way, the intermediary function that's spawning the goroutine has to know to pass this state. Simply using context.Go() by convention actually seems easier, because I don't have to modify my method to accept a Context.

For example:

func doStuffInGoroutine(myarg string, myotherarg int) {
    context.Go(func() {
        doStuff(myarg, myotherarg)
    })
}

vs.

func doStuffInGoroutine(myarg string, myotherarg int, ctx *Context) {
    go func() {
        doStuff(myarg, myotherarg, ctx)
    }
}

Either way we're aware of the need to pass context, but the 2nd one has a more verbose syntax.

Context - error: When EOF occurs when dialing proxy, logging or reporting the user agent adds nothing but noise. We should select those context really makes sense. If it becomes tedious to extract same set of information from context, a handy custom helper FromContext(e *Error, ctx Context) *Error would be enough.

I don't think the user-agent is noise. Perhaps some user agents behave in a way that causes the EOF (for example the user agent disconnects early). That's something that we'd want to see in our Influx data. I think it was actually you who convinced me that we should collect as much information as is reasonably possible into the raw Influx data to see what we can learn from it.

oxtoacart · 2016-05-15T08:08:22Z

BTW, getting back to the point about attaching metadata to errors being complementary to passing it down via context, I think what would be really cool is if the errors package integrated with context so that a call to error.New() or error.Wrap automatically attaches the context. That way, we can use the context setting convention to fill our context and our errors, killing two birds with one stone.

oxtoacart · 2016-05-15T08:18:01Z

Sorry for the confusion. I just recommend to use fields instead of formatted strings to create both error and log entry in the future. That would eliminate return fmt.Errorf("Fail to xxx: %v", err) style completely, so code can either: 1. return error as is, or 2. attach fields use errors package. What if there're 3rd party package in between? I don't know but didn't see any occurrence. Anyway we need to propagate errors bottom up.

I agree, the fmt.Errorf() style is something I regret doing. It was convenient at first, but it loses too much information.

I think this points out something important, namely that we have two types of data related to errors and logging.

Data about the error, like what was the underlying error, where in the code did the error occur, what were the data parameters that led to the error, etc.
Data about the context in which the error occurred, such as high-level operation (proxying), user-agent, whether or not we're detouring, the user's locale, etc.

I'm thinking that the best approach is a combination where:

a. We populate context before errors even happen so that we have that available both for errors and for debug logging

b. At the point that an error occurs, we switch to using structured errors that capture information about the error in a machine-readable format (e.g. callsite information, nested error preserved in its original form, etc.) In a way, Java's exceptions are a good example here in the way that they contain typed fields and also nested causes.

oxtoacart · 2016-05-15T08:27:44Z

Ha comments become cluttered. By saying passing context, I mean passing context when necessary instead of calling context.Go in all occurrences.

My view is that intermediary layers don't know when context is or is not necessary, so they have to assume it always is. For example, something like this:

conn, err := idletiming.Wrap(dialFN)

Taking our real context as an example, with user-agent, request id, origin host, etc., the idletiming.Wrap function may not care about any of this because it doesn't even log anything or return errors, however the dialFN may care because it does. The only solution is to pass the context through all the time, so now we have:

conn, err := idletiming.Wrap(ctx, dialFN)

And dialFN has to take a context as well. And so on. That's a lot of functions that need to change, and it clutters the code. With my context implementation, most of the time nothing is required. If idletiming happens to open a goroutine (which we do relatively rarely on the critical path), then it just switches to context.Go, which is quite easy and an isolated change.

oxtoacart · 2016-05-15T08:32:27Z

BTW, this branch has a bug that causes context to leak, which does pollute the log. That's fixed in #4295.

oxtoacart added 10 commits May 14, 2016 20:06

Squashed 'src/github.com/getlantern/gls/' content from commit 2b8af7a

7ff2058

git-subtree-dir: src/github.com/getlantern/gls git-subtree-split: 2b8af7a308eb752fc4030f93e85a8da641d2006e

Merge commit '7ff2058ac7035b21dc407a959fd2b32ecac5bba2' as 'src/githu…

9ed64f0

…b.com/getlantern/gls'

Squashed 'src/github.com/tylerb/is/' content from commit 6c1a467

58759d1

git-subtree-dir: src/github.com/tylerb/is git-subtree-split: 6c1a46754cf86b12fffbe5ecaadba8c382059165

Merge commit '58759d1738115fda8f4b3ae0ce6ef4dc9d5edb32' as 'src/githu…

41fb053

…b.com/tylerb/is'

Squashed 'src/github.com/oxtoacart/bpool/' content from commit 4e1c556

dba2671

git-subtree-dir: src/github.com/oxtoacart/bpool git-subtree-split: 4e1c5567d7c2dd59fa4c7c83d34c2f3528b025d6

Merge commit 'dba26717caa6998df0f897ac63e3f7647a4a4630' as 'src/githu…

8983bfb

…b.com/oxtoacart/bpool'

[gost] Added github.com/oxtoacart/bpool and its dependencies

2301b60

Added contextual logging to golog

888a1ee

Using goroutine context for logging

5c5b3a9

Cleaned up comment

bc10f0a

oxtoacart assigned fffw May 15, 2016

oxtoacart reviewed May 15, 2016
View reviewed changes

oxtoacart changed the title ~~[POC] Using gls library to provide context rathar than attaching it to errors~~ [POC] Using gls library to provide context rather than attaching it to errors May 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Using gls library to provide context rather than attaching it to errors #4293

[POC] Using gls library to provide context rather than attaching it to errors #4293

oxtoacart commented May 15, 2016

oxtoacart May 15, 2016

coveralls commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016 •

edited

Loading

oxtoacart commented May 15, 2016

[POC] Using gls library to provide context rather than attaching it to errors #4293

[POC] Using gls library to provide context rather than attaching it to errors #4293

Conversation

oxtoacart commented May 15, 2016

oxtoacart May 15, 2016

Choose a reason for hiding this comment

coveralls commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016 • edited Loading

oxtoacart commented May 15, 2016

oxtoacart commented May 15, 2016 •

edited

Loading