Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Second attempt at logging refactor #1535

Merged
merged 1 commit into from
Sep 18, 2017
Merged

WIP: Second attempt at logging refactor #1535

merged 1 commit into from
Sep 18, 2017

Conversation

desa
Copy link
Contributor

@desa desa commented Aug 24, 2017

No description provided.

@desa desa force-pushed the md-log-v2 branch 4 times, most recently from 35513ea to 0e50102 Compare September 5, 2017 15:28
@desa desa force-pushed the md-log-v2 branch 4 times, most recently from 245be10 to 5fb3f06 Compare September 8, 2017 17:00
Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed about half of the changes.

I have a crazy idea. Should we just drop zap and write our own tiny log package? At this point we have reduced the responsibility of logging down to just formatting data, which we had to write our own logfmt implementation anyways. How much more work is it to just consume the keyvalue.T type and directly write data in logfmt at this point? We could make the package internal to the diagnostic service so its clear its not general purpose.

derivative.go Outdated
@@ -125,7 +126,7 @@ func (n *DerivativeNode) derivative(prev, curr models.Fields, prevTime, currTime
f1, ok := numToFloat(curr[n.d.Field])
if !ok {
n.incrementErrorCount()
n.logger.Printf("E! cannot apply derivative to type %T", curr[n.d.Field])
n.diag.Error("cannot perform derivative", fmt.Errorf("field %s is the wrong %T", n.d.Field, curr[n.d.Field]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one should read:

n.diag.Error("cannot perform derivative", errors.New("field is the wrong type") keyvalue.KV("field",n.d.Field), keyvalue.KV("type", reflect.TypeOf(curr[n.d.Field]).String()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflect.TypeOf(curr[n.d.Field]).String()

panics when curr[n.d.Field] is nil
https://play.golang.org/p/gjlr3kuusL

Went with

keyvalue.KV("type", fmt.Sprintf("%T", curr[n.d.Field]))

instead

derivative.go Outdated
@@ -140,7 +141,7 @@ func (n *DerivativeNode) derivative(prev, curr models.Fields, prevTime, currTime
elapsed := float64(currTime.Sub(prevTime))
if elapsed == 0 {
n.incrementErrorCount()
n.logger.Printf("E! cannot perform derivative elapsed time was 0")
n.diag.Error("cannot perform derivative", errors.New("elaspsed time was 0"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the n.diag.Error() method call n.incrementErrorCount() internally? That would remove the onus on each node to correctly count their errors and rather only need to report them to the Diagnostic interface. This probably means you need a small shim implementation of the interface within the kapacitor package to intercept Error and With* calls.

Does that sound reasonable and useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about doing something like that, but I wasn't sure exactly the best way to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

log.go Outdated
return batch, nil
}
n.logger.Println(n.key, n.buf.String())
// TODO: fix prefix and other loger here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO and others in the file complete?

node.go Outdated
@@ -132,7 +153,8 @@ func (n *node) start(snapshot []byte) {
err = fmt.Errorf("%s: Trace:%s", r, string(trace[:n]))
}
n.abortParentEdges()
n.logger.Println("E!", err)
n.diag.Error("encountered panic", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not always true that a panic was encountered here. This error message should be something like node failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

server/config.go Outdated
Task task_store.Config `toml:"task"`
InfluxDB []influxdb.Config `toml:"influxdb" override:"influxdb,element-key=name"`
// TODO(desa): this should change to diagnostic eventually
Logging diagnostic.Config `toml:"logging"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is about logging, but logging is implemented through diagnostics so I think this name is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

// TODO: deal with prefix and level
func (h *KapacitorHandler) LogBatchData(level, prefix string, batch edge.BufferedBatchMessage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think prefix can just be a String field. The meaning prefix will be wrong at that point, but the intent is to make the logs unique which a field will do fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanielc any idea what the field should be named?

}

func (h *ReplayHandler) Error(msg string, err error, ctx ...keyvalue.T) {
if len(ctx) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again repeated code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. don't know how I missed this.

}

func (h *EdgeHandler) Collect(mtype edge.MessageType) {
h.l.Debug("collected message", zap.Stringer("message_type", mtype))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are logging every message?

Copy link
Contributor Author

@desa desa Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes from https://github.com/influxdata/kapacitor/blob/md-log-v2/edge/log.go#L24-L35 and I don't think the LogEdge is ever used.

h.l.Debug("emitted message", zap.Stringer("message_type", mtype))
}

type LogLevel int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does these need to be exported types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. No reason they need to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now.

h.l.Debug(string(buf))
case LLError:
h.l.Error(string(buf))
case LLFatal:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need Fatal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope.

@desa
Copy link
Contributor Author

desa commented Sep 11, 2017

How much more work is it to just consume the keyvalue.T type and directly write data in logfmt at this point? We could make the package internal to the diagnostic service so its clear its not general purpose.

Don't think it would be too much work, but I could be off. Currently the keyvalue.Ts are both strings. Would have to do a bit of work to make them a little more general.

@nathanielc
Copy link
Contributor

@desa If the types of keyvalue.T are strings doesn't that work well for logging, since everything will be encoded as text to be able to log it?

@desa desa force-pushed the md-log-v2 branch 2 times, most recently from 1320843 to 3d88527 Compare September 13, 2017 15:24
@desa
Copy link
Contributor Author

desa commented Sep 13, 2017

@nathanielc I think I've addressed all of the issues from the review so far.

@desa
Copy link
Contributor Author

desa commented Sep 15, 2017

@nathanielc this should be in a ready for another review.

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small suggestions , then I think its ready to go.

var el klog.Field
switch v.(type) {
case int:
el = klog.Int(k, v.(int))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields are never int they are always int64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

ts := []klog.Field{}
for k, v := range tags {
var el klog.Field
switch v.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type switches in Go can create an autocasted identifier

switch t := v.(type) {
case int64:
    el = klog.Int64(k, t)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that 👍 .
Fixed.

case bool:
el = klog.Bool(k, v.(bool))
default:
el = klog.String(k, fmt.Sprintf("%v", v))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why there are fields implementations for each type instead of simply using fmt.Sprintf("%v",v) for everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had already needed the other types elsewhere and I was trying to be somewhat performance conscious. In the past whenever I've used fmt.Sprintf a ton I've had a very sad CPU.

@desa
Copy link
Contributor Author

desa commented Sep 18, 2017

@nathanielc I think it's ready to merge on your signal.

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through your TODOs, and may some comments.

// // response time, report in microseconds because this is consistent
// // with apache's %D parameter in mod_log_config
// strconv.FormatInt(time.Since(start).Nanoseconds()/1000, 10),
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this dead code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

// // response time, report in microseconds because this is consistent
// // with apache's %D parameter in mod_log_config
// strconv.FormatInt(time.Since(start).Nanoseconds()/1000, 10),
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -1225,15 +1232,15 @@ func (c *influxdbCluster) startUDPListener(se subEntry, port string) (*net.UDPAd
conf.Buffer = c.udpBuffer
conf.ReadBuffer = c.udpReadBuffer

l := c.LogService.NewLogger(fmt.Sprintf("[udp:%s.%s] ", se.db, se.rp), log.LstdFlags)
service := udp.NewService(conf, l)
d := c.diag.WithUDPContext(fmt.Sprintf("udp:%s.%s", se.db, se.rp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the WithUDPContext accept two string arguments for db and rp? Then the context is correctly tagged instead of having a composite string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.

service.PointsWriter = c.PointsWriter
err := service.Open()
if err != nil {
return nil, err
}
c.services[se] = service
c.logger.Println("I! started UDP listener for", se.db, se.rp)
c.diag.StartedUDPListener(fmt.Sprintf("%s.%s", se.db, se.rp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again here, why not pass the db and rp strings to the method directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

task.go Outdated
@@ -551,7 +554,8 @@ func (et *ExecutingTask) runSnapshotter() {
case <-ticker.C:
snapshot, err := et.Snapshot()
if err != nil {
et.logger.Println("E! failed to snapshot task", et.Task.ID, err)
// TODO(desa): I think this might already have the task id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the result of this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's set.

task.go Outdated
@@ -562,7 +566,8 @@ func (et *ExecutingTask) runSnapshotter() {
if size > 0 {
err = et.tm.TaskStore.SaveSnapshot(et.Task.ID, snapshot)
if err != nil {
et.logger.Println("E! failed to save task snapshot", et.Task.ID, err)
// TODO(desa): I think this might already have the task id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's set.

udf.go Outdated
@@ -266,7 +265,10 @@ func (p *UDFProcess) logStdErr() {
defer p.logStdErrGroup.Done()
scanner := bufio.NewScanner(p.stderr)
for scanner.Scan() {
p.logger.Println("I!P", scanner.Text())
// TODO(desa): fix the issue here. issue comes from loadUDFInfo in services/udf/service.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO complete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


// TODO(desa): that okay? Nil check against this happens in udf.go logStdErr()
// alternative would be to have a mock implementation that gets passed in here
u, err := s.Create(name, "", "", nil, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is okay, its possible valuable information is lost if we do not provide a diagnostic here. Looking at it the simplest approach would be to add a WithUDFDiagnotic method to the Diagnostic interface within this package. The udf.Diagnostic interface is quite simple so its shouldn't be hard to accommodate. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase and squash and this is good to go!

vendor zap for logging refactor

Change name of diagnostic handlers file

Add diagnostics for slack service

Use keyvalue instead of map[string]string for ctx

Add diagnostic to task_store service

Add diagnostics to reporting handler

Add diagnostics to storage service

Fix wrong service type

Add diagnostic to http service

Add diagnostics to alerta service

Add diagnostic to kapacitor package

SQUASH ME: figure out how to deal with UDF node

Add diagnostic to alert service

Add hipchat diagnostic

Add diagnostic for udf and udf service

Add diagnostic to pagerduty

Add diagnostic to SMTP service

Add diagnostic to opsgenie service

Add diagnostic to pushover service

Add diagnostic to httppost service

Add diagnostic to sensu service

Add diagnostic to snmp service

Add diagnostic to telegram service

Add diagnostic to mqtt service

Add diagnostic to talk handler

Add diagnostics to config override service

Remove logger from test service

I'm not sure if this is what we want to do

Add diagnostic to server package

Not totally sold on the interface, for server, but it felt okay for this
package for some reason

Add diagnostic to replay service

Add diagnostic to kubernetes service

Add diagnostic to swarm service

Add diagnostic to deadman service

Add diagnostic to noauth service

Add diagnostic to stats service

Add diagnostic to UDP service

Add diagnostic to InfluxDB service

Add diagnostic to scraper services

Add diagnostic to edge logger

Add static level diagnostic handler

Remove log service from server package

Remove log service from kapacitord

Change diagnostic Service to struct

SQUASH ME

Address various TODO items

Address various TODOs

Remove logging service

Change from interfaces to explicit types on service

Add time configuration to to zap

Use ioutil.Discard in tests

Set default level to debug

Remove commented out code

Address TODOs in diagnostic package

Prevent newlines from scaper service

DRY up common code

Fix level logger

Add log level setting from CLI

Fix log data node

Address a few of the changes suggested in the PR

Fix addition comments from PR

Fix log node level and prefix

Wrap NodeDiagnostic to increment error count

Clarify comment in Error function

Fix derivative error function

Fix alerta diagnostic

First pass: impelment basic logfmt logger

Remove zap dependency

Remove usage of zap

Don't use pointers to objects

Fix log line format

Format fields and tags appropriately

Add time to log function

Add logger tests

Add tests to logger

Fix logger test wording

Add changes suggested in PR

Address issues from PR
@desa desa merged commit 5d228b2 into master Sep 18, 2017
@desa desa deleted the md-log-v2 branch September 18, 2017 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants