Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

NG: Logging #635

Closed
dmcgowan opened this issue Oct 16, 2014 · 18 comments
Closed

NG: Logging #635

dmcgowan opened this issue Oct 16, 2014 · 18 comments
Milestone

Comments

@dmcgowan
Copy link
Contributor

Define a standard way to log across in the registry in a uniform and easy to process manner. Some topics to decide on...

  • Log format (logfmt, json, google-glog, something else)
  • Log output (stderr, file)
  • Error handling (stack traces and error wrapping)
  • Log context (how to group log lines to specific requests)

The main concern is balancing a uniform method and format with something that is flexible. Flexibility meaning customization can be done on top of what is decided, and questions about whether to use volumes, which agent forwarder, or pipe logs through log rotation script X don't need to be discussed here.

@wking
Copy link
Contributor

wking commented Oct 16, 2014

On Thu, Oct 16, 2014 at 10:22:05AM -0700, Derek McGowan wrote:

Define a standard way to log across in the registry in a uniform and
easy to process manner. Some topics to decide on...

If we use a reverse-proxy for serious deployments (see #623), we can
just use that proxy to generate access/error logs. For example, Nginx
can log to syslog out of the box 1. That means we don't need much
logging in the registry itself, just debugging info if we have trouble
with a storage backend and things like that.

@noxiouz
Copy link
Contributor

noxiouz commented Oct 16, 2014

Error handling (stack traces and error wrapping)

I believe, that this approach looks good:
https://github.com/facebookgo/stackerr

@dmcgowan
Copy link
Contributor Author

While log routing doesn't need to dictate the topics above, some possible routers we may want to integrate and ensure our format works nicely with.

@dmp42
Copy link
Contributor

dmp42 commented Oct 22, 2014

@wking

We do want access logs from the registry - this is the only way you are going to be able to tell if your proxy is actually sending the request back

We also want exhaustive application logging (including context, etc). This is absolutely needed at least to debug.

The default behavior should be to have these to stdout.

@wking
Copy link
Contributor

wking commented Oct 22, 2014

On Wed, Oct 22, 2014 at 11:25:45AM -0700, Olivier Gambier wrote:

We do want access logs from the registry - this is the only way
you are going to be able to tell if your proxy is actually sending
the request back

Won't you get timeout errors in the reverse-proxy logs if the registry
isn't responding? And I'd expect the proxy logs would be where to
look to see if the reverse-proxy is responding (vs. hearing back from
the registry and then not forwarding that on to the client?).

We also want exhaustive application logging (including context,
etc). This is absolutely needed at least to debug.

I don't think there will be much going on in the registry itself.
It's just a translation layer between client-facing REST API and
storage-facing Go/libchan APIs. I expect reverse-proxy logs will
cover the former, and storage-driver logs will cover the latter. I'm
fine logging the translation itself for debugging, I just don't think
it's going to need all that much planning (since the translation code
should be fairly stable).

The default behavior should be to have these to stdout.

That would probably be fine, since I expect only registry devs will
want this debugging info. I don't think anyone will need it for
production installs.

@dmp42
Copy link
Contributor

dmp42 commented Oct 22, 2014

@wking arguing about technical points (and agreeing, or disagreeing on them) is perfectly fine.

On the other hand, discarding valid use-cases is not :-)

I do want access logs on the registry, since the registry is a standalone product, and not everyone is willing to run it behind nginx.

This is a standard feature of about every standalone web application, and just because it's possible to do otherwise with proxy monkeying (that I'm advising against) doesn't mean we should ignore it altogether.

Still, to keep addressing these:

Won't you get timeout errors in the reverse-proxy logs if the registry isn't responding?

Why would a timeout be the only case? What about hitting the wrong backend?
And again, what about you don't have nginx in front of this?

I don't think there will be much going on in the registry itself.

Well, there is a lot :-).
Extensions come to mind.

I expect reverse-proxy logs will cover the former,

Read my lips:

  • we will not ship the registry by default with a nginx proxy
  • we will advise against using proxies unless people know what they do and are willing to solve problems themselves
  • I'm perfectly happy with letting people do use a proxy if they want - but you have to stop considering everybody will do as you think is good :-)

Proxy support in registry v1 has been a terrible train wreck, and a liability for anyone who tried that in high-scale production. I don't expect things to go very differently with v2, and I intend in making sure we don't bind ourselves to such a design - although I genuinely want people to be able to rope themselves with it if they would want that :-)

I just don't think it's going to need all that much planning (since the translation code should be fairly stable).

So, what happens then? Developers don't have a clean way to log thing, they start using various libraries. I refuse to merge that because it's messy, and they will switch to no log at all, or printf, and we end-up with a poorly debuggable project failing to properly log information about its behavior.

Since it's so crappy, it's not picked-up by ISV who want to build on it or ship it as a product, it's hard to post-mortem errors in it, and ultimately I'm not happy :)

I don't think anyone will need it for production installs.

Having auditable logs, and a robust logging infrastructure is IMHO a standard feature of any serious software.

I want this to be designed and dogfed from the start since it will be harder to get right down the road.

@wking
Copy link
Contributor

wking commented Oct 22, 2014

On Wed, Oct 22, 2014 at 04:09:28PM -0700, Olivier Gambier wrote:

On the other hand, discarding valid use-cases is not :-)

Agreed. I'm just pushing to try and see what those use-cases are.
I'll try to push a bit more politely ;).

I do want access logs on the registry, since the registry is a
standalone product, and not everyone is willing to run it behind
nginx.

Assuming we get the proxy bits sorted (and with Nginx's ticket 251,
it's certainly possible that we don't, see 1), I don't see why users
would even need to know that the default registry image bundled Nginx,
which could be enabled for log-generation. Of course that's a heavy
stick if it's just for access/error logs, but if we're also using it
for auth (still up in the air, see #623) it seems like it's worth the
trouble to me.

Won't you get timeout errors in the reverse-proxy logs if the
registry isn't responding?

Why would a timeout be the only case? What about hitting the wrong
backend?

For storage? I'm not sure how the multi-backend storage works. How
do we handle that now?

I don't think there will be much going on in the registry itself.

Well, there is a lot :-).
Extensions come to mind.

I'm hoping we can just provide HTTP notification for our actions, and
leave things currently handled by extensions to external webhook
recipients (that's how I read 2, although now that I look at it
again I don't actually see “HTTP” in @shreyu86's comment).

  • I'm perfectly happy with letting people do use a proxy if they
    want - but you have to stop considering everybody will do as you
    think is good :-)

Sure. I'm happy if folks want to write a reverse-proxy layer in Go
instead of using Nginx. I'm even fine if that gets compiled into the
registry [3].

Proxy support in registry v1 has been a terrible train wreck, and a
liability for anyone who tried that in high-scale production…

I just didn't have a clear idea of how it was going wrong before 1.
Previous mentions of timeouts and such seemed easy to overcome with
resumable actions.

I just don't think it's going to need all that much planning
(since the translation code should be fairly stable).

So, what happens then? Developers don't have a clean way to log
thing, they start using various libraries. I refuse to merge that
because it's messy, and they will switch to no log at all, or
printf, and we end-up with a poorly debuggable project failing to
properly log information about its behavior…

Whenever we get a report from someone that things are buggy, we ask
for their reverse-proxy and storage logs. If they don't have those,
we have them set an environment variable that turns on whatever
logging our HTTP client (client-side) an libchan (storage-side)
library uses.

I don't think anyone will need it for production installs.

Having auditable logs, and a robust logging infrastructure is IMHO a
standard feature of any serious software.

I agree. I'm just not convinced that the registry is the place for
the serious logging. I think the HTTP / libchan layers are much more
appropriate places for the logging. And that HTTP layer should be a
serious reverse-proxy or serious Go library (or both).

I want this to be designed and dogfed from the start since it will
be harder to get right down the road.

Maybe I'm just arguing that we should punt here until we actually have
some registry-side code that is complicated enough to be worth logging
(indepently of the I/O logs, which I think we all agree are useful,
but which aren't code that will live in this repository). We should
have time to spec out the logging in that case, and we won't have to
argue about whether or not we'll actually need serious logs ;).

@wking
Copy link
Contributor

wking commented Oct 23, 2014

On Wed, Oct 22, 2014 at 04:58:36PM -0700, W. Trevor King wrote:

Wed, Oct 22, 2014 at 04:09:28PM -0700, Olivier Gambier:

I want this to be designed and dogfed from the start since it will
be harder to get right down the road.

Maybe I'm just arguing that we should punt here…

And for what it's worth, Go seems to have a logger 1 with an
optional syslog backend 2 in the standard library. That covers all
the serious logging needs I've ever had, although configurable output
levels would be nice (warning/info/debug). Folks who do want logstash
storage can likely use the logstash's (alpha?) syslog handler 3.
Although I have to admit I don't see anything wrong with syslog, so
folks who write all of these fancy new log formats likely know
something I don't ;). Or they just have too much time on their hands
;). In any case, I'm happy to punt everything beyond
stdout/stderr/syslog to Docker itself (see moby/moby#7195).

@wking
Copy link
Contributor

wking commented Oct 24, 2014

On Wed, Oct 22, 2014 at 05:16:49PM -0700, W. Trevor King wrote:

Wed, Oct 22, 2014 at 04:58:36PM -0700, W. Trevor King:

Wed, Oct 22, 2014 at 04:09:28PM -0700, Olivier Gambier:

I want this to be designed and dogfed from the start since it will
be harder to get right down the road.

Maybe I'm just arguing that we should punt here…

And for what it's worth, Go seems to have a logger [1] with an
optional syslog backend [2] in the standard library.

[1]: http://golang.org/pkg/log/
[2]: http://golang.org/pkg/log/syslog/

And if the Docker client isn't using that ‘log’ package now, it was
previously (before moby/moby@92df943, daemon logging: unifying
output and timestamps, 2014-09-25) and likely will be again soon
(moby/moby#8745).

@visualphoenix
Copy link
Contributor

@wking said:

If we use a reverse-proxy for serious deployments (see #623), we can just use that proxy to generate access/error logs.

So in my case I might actually terminate SSL with an F5 BIG-IP Local Traffic Manager and then run the registry behind that. For instance, use a WIP for routing to the nearest region, a VIP per region, and then put a registry or cluster of registries behind a few HAProxies per region.

I'm definitely interested in having these registries used a distributed shared storage. But I think @dmp42 has a point in wanting the registries to do the logging. I don't think we should require the proxy to generate the access/error logs.

@visualphoenix
Copy link
Contributor

@wking said:

And if the Docker client isn't using that ‘log’ package now, it was previously (before moby/moby@92df943, daemon logging: unifying output and timestamps, 2014-09-25) and likely will be again soon (moby/moby#8745).

That's super interesting. Looking at the diff didnt make it obvious to me what the interaction was with /dev/me

Flannel is using https://github.com/golang/glog which looked nice (which I now see was mentioned in the original post).

@dmcgowan said:

While log routing doesn't need to dictate the topics above, some possible routers we may want to integrate and ensure our format works nicely with.

Maybe we take a page out of the Elliptics playbook and let the user configure the log output format and wiring. Then we punt and provide sane defaults to stdout/stderr and test that we provide enough flexibility to customize the output to be compatible with those other utilities.

@wking
Copy link
Contributor

wking commented Oct 25, 2014

On Fri, Oct 24, 2014 at 09:36:32PM -0700, Raymond Barbiero wrote:

@wking said:

If we use a reverse-proxy for serious deployments (see #623), we
can just use that proxy to generate access/error logs.

So in my case I might actually terminate SSL with an F5 BIG-IP Local
Traffic Manager and then run the registry behind that. For instance,
use a WIP for routing to the nearest region, a VIP per region, and
then put a registry or cluster of registries behind HAProxy per
region.

You can log to syslog with HAProxy too 1. And obviously,
storage-driver logging doesn't depend on what's in front of the
registry. What other logs (besides access/error logs from the
registry (via it's fronting proxy) and access/error logs from the
storage drivers) were you looking for?

FWIW, Go's built-in server uses the built-in log framework for errors
2 and you can inject access logging with a handler wrapper
(e.g. 3).

@BrianBland
Copy link

Docker has just switched over to using logrus as of moby/moby#8770, so I propose that we use this for consistent formatting and hooks across the entire product.

As for log rotation, we can use lumberjack as our log output writer. This package supports max file size and age for rotation and claims to be compatible with the standard go logging semantics, the same as logrus, so these should theoretically work together.

@dmcgowan
Copy link
Contributor Author

+1 on both logrus and lumberjack, logrus supports pretty tty output, default logfmt output for non-tty and optional json output.

Also for tracing and contextual logging, this golang blog post is a few months old but it is relevant to creating a standard. https://blog.golang.org/context

At Google, we require that Go programmers pass a Context parameter as the first argument to every function on the call path between incoming and outgoing requests.

@dmp42
Copy link
Contributor

dmp42 commented Oct 28, 2014

Summing things up:

  • if people want to work their access logs / errors logs from a proxy / lb, that's fine, and I don't see a reason to prevent that
  • registries should still produce http access logs: in a multi-instance registry cluster, it's very helpful to be able to track down a request to the instance (and log flow point) handling it
  • I do insist on this third point: we need application level logging, with context, and configurable output - this must be handled by the registry since it's through registry configuration that we are going to determine that (level, output) - dependent libraries (libchan, extensions, storage drivers) should use that to do their logging

What needs to be done:

  • pick a poison (since docker main chose logrus, we would need a strong point against it, or for something else to not use that one as well)
  • write some docs to instruct extension/drivers authors on how to do their logging properly to integrate with that

@wking
Copy link
Contributor

wking commented Oct 28, 2014

On Tue, Oct 28, 2014 at 01:37:10PM -0700, Olivier Gambier wrote:

dependent libraries (libchan, extensions, storage drivers) should
use that to do their logging

I'm still pushing for extensions and storage drivers to be separate
processes in separate containers with their own independent logging.
But for the core registry code and any in-container extensions and
storage drivers this summary looks good to me.

@dmp42
Copy link
Contributor

dmp42 commented Oct 28, 2014

I'm still pushing for extensions and storage drivers to be separate processes in separate containers with their own independent logging.

That should not be the default (and not what we encourage people to do) - but then I don't see a reason to prevent it if someone wants to go crazy with that :)

@stevvooe
Copy link
Contributor

Superseded by distribution/distribution#86.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants