Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/54 proper logger #131

Merged
merged 27 commits into from
Nov 29, 2021
Merged

Fix/54 proper logger #131

merged 27 commits into from
Nov 29, 2021

Conversation

inancgumus
Copy link
Member

@inancgumus inancgumus commented Nov 23, 2021

This PR is for #54 (Proper logger):

  • Passes the same logger extension-wide
  • Adds Debugf to the entities in the extension
  • Updates tests to use a null logger when needed

With this, I'm able to see the problems in #125 and #49.

So I think they are helpful for us to diagnose issues in the future as well.

Closes #54

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job with this! 👏 It will surely help troubleshoot thorny issues like #125, though now running with --verbose is super noisy. It would be good to place the cdp:recv event a level below Debug (Trace) that could be activated with a second --verbose, though that's probably not supported right now and would need changes in k6. Otherwise putting it behind an environment variable (XK6_BROWSER_TRACE) might be a good idea until -vv is supported. From the logrus side it would be a matter of simply calling Tracef() instead of Debugf(). Since those make the majority of logged events (70% by my rough estimate), and we usually don't care about them unless we're debugging the CDP layer itself, it would make debugging our code less noisy.

I left a few minor suggestions, and here are some general comments:

  • Using structured logging for all this like you suggested over Slack would be much better. E.g. all places where err:%v is logged would be better with WithError(err) instead. It would also cut down on all the repetition, since you could initialize the logger with all fields upfront (in a constructor or at the start of the method) and then simply call Debug() with just the message. But I don't think this is a requirement for this PR and we can improve on that later.
  • Another related thing I was thinking last week would be to abandon our common.Logger altogether and use logrus directly. Some of the things Logger.Logf does are of questionable importance (the elapsed field, debugOverride is always false, the fallback to fmt.Printf()...), though standardizing the category field and logging the goroutine ID seem useful.
    OTOH, if we ever decide to switch logging libraries, as we've been planning to do in k6 for years now, having a custom logging API would be very convenient. So maybe we should keep it but simplify it considerably.
  • If we do decide to keep our wrapper, we might want to abstract away the logging of the calling function names. This adds more work when functions are renamed/moved, and doing it with LSP won't pick it up. We can easily get the caller function name with some runtime introspection, see this SO answer. I actually had a patch I never pushed that does this when I was debugging some things last week, and it worked quite well. Obviously it won't pick up all the nuances like adding :go for goroutines or custom categories based on if branches, but those aren't the biggest concern when refactoring.

common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/browser.go Outdated Show resolved Hide resolved
common/connection.go Outdated Show resolved Hide resolved
@@ -408,12 +474,14 @@ func (c *Connection) Execute(ctx context.Context, method string, params easyjson
for {
select {
case <-evCancelCtx.Done():
c.logger.Debugf("connection:Execute:<-evCancelCtx.Done()", "wsURL:%q err:%v", c.wsURL, evCancelCtx.Err())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I would use WithError(evCancelCtx.Err()).

@@ -84,23 +84,26 @@ func (l *Logger) Logf(level logrus.Level, category string, msg string, args ...i
if now == elapsed {
elapsed = 0
}
defer func() {
l.lastLogCall = now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to keep track of this or elapsed at all... Why is it useful to know the time elapsed between two (possibly unrelated) Logf() calls? 🤷‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elapsed was helpful for me when I was debugging the context error.

For example, it showed me where the code was stuck for 20 seconds and failed.

common/page.go Outdated Show resolved Hide resolved
common/session.go Outdated Show resolved Hide resolved
@inancgumus
Copy link
Member Author

inancgumus commented Nov 25, 2021

Nice job with this! 👏 It will surely help troubleshoot thorny issues like #125

Thanks! ❤️

though now running with --verbose is super noisy. It would be good to place the CDP:recv event a level below Debug (Trace) that could be activated with a second --verbose, though that's probably not supported right now and would need changes in k6. Otherwise putting it behind an environment variable (XK6_BROWSER_TRACE) might be a good idea until -vv is supported. From the logrus side it would be a matter of simply calling Tracef() instead of Debugf(). Since those make the majority of logged events (70% by my rough estimate), and we usually don't care about them unless we're debugging the CDP layer itself, it would make debugging our code less noisy.

We use environment flags only on the testing side right now, so we can add a XK6_BROWSER_TRACE there.

However, we use flags for the test runs (like the debug flag). So, we might also add a trace flag to the launch options and enable CDP messages when it's on? And replace all CDP Debugfs with Tracefs and update the logger accordingly. WDYT?

  • Using structured logging for all this like you suggested over Slack would be much better. E.g. all places where err:%v is logged would be better with WithError(err) instead. It would also cut down on all the repetition, since you could initialize the logger with all fields upfront (in a constructor or at the start of the method) and then simply call Debug() with just the message. But I don't think this is a requirement for this PR and we can improve on that later.

  • Another related thing I was thinking last week would be to abandon our common.Logger altogether and use logrus directly. Some of the things Logger.Logf does are of questionable importance (the elapsed field, debugOverride is always false, the fallback to fmt.Printf()...), though standardizing the category field and logging the goroutine ID seem useful.

That would be awesome. We can do something like this:

// common/logger.go
func (l *Logger) WithError(err error) *logrus.Entry {
	return l.logger.WithError(err)
}

// common/browser.go
b.logger.WithError(err).Debugf(
    "Browser:onAttachedToTarget:background_page", "sid:%v tid:%v, returns",
    ev.SessionID, ev.TargetInfo.TargetID)

However, as I wrote you on Slack:

  • The problem with this approach (b.logger.WithError) is that, the returned *logrus.Entry no longer uses our custom common/Logger.Debugf (and the other custom) methods.

  • We can easily use logrus's methods on the returned *Entry, but we'll be stuck to Logrus. Also, we won't be able to do custom stuff that we do in our Logger's Logf method.

  • goroutineid and elapsed fields were helpful when I was debugging the context error. For example, elapsed showed me the code was stuck 20s and failed (But we can deduct that from the timings as well 🤷 ) Also, I could follow which goroutines were responsible for what. We can add goroutine id at the beginning of each component. But, if we follow this approach, we'll need to get rid of elapsed.

OTOH, if we ever decide to switch logging libraries, as we've been planning to do in k6 for years now, having a custom logging API would be very convenient. So maybe we should keep it but simplify it considerably.

💡 Maybe we should write a custom formatter for using our custom logger on top of Logrus? That can possibly solve the issues I explained above.

  • If we do decide to keep our wrapper, we might want to abstract away the logging of the calling function names. This adds more work when functions are renamed/moved, and doing it with LSP won't pick it up. We can easily get the caller function name with some runtime introspection, see this SO answer. I actually had a patch I never pushed that does this when I was debugging some things last week, and it worked quite well. Obviously it won't pick up all the nuances like adding :go for goroutines or custom categories based on if branches, but those aren't the biggest concern when refactoring.

Can you send me the patch so I can look at it?

For example, with this code:

func NewLogger(ctx context.Context, logger *logrus.Logger, debugOverride bool, categoryFilter *regexp.Regexp) *Logger {
	l := Logger{
		ctx:            ctx,
		logger:         logger,
		mu:             sync.Mutex{},
		debugOverride:  debugOverride,
		categoryFilter: categoryFilter,
		lastLogCall:    0,
	}
	logger.SetReportCaller(true)
	logger.SetFormatter(&logrus.JSONFormatter{
		CallerPrettyfier: caller(),
		FieldMap: logrus.FieldMap{
			logrus.FieldKeyFile: "caller",
		},
	})
	return &l
}

func caller() func(*runtime.Frame) (function string, file string) {
	return func(f *runtime.Frame) (function string, file string) {
		return f.Func.Name(), fmt.Sprintf("%s:%d", f.File, f.Line)
	}
}

I receive log messages like this:

{"caller":"/Users/inanc/grafana/xk6-browser/common/logger.go:124","category":"cdp:recv","elapsed":"0 ms","func":"github.com/grafana/xk6-browser/common.(*Logger).Logf","goroutine":21,"level":"info","msg":"\u003c- {\"id\":6,\"result\":{}}","time":"2021-11-25T14:12:04+03:00"}
{"caller":"/Users/inanc/grafana/xk6-browser/common/logger.go:124","category":"Connection:recvLoop:msg.ID:emit","elapsed":"0 ms","func":"github.com/grafana/xk6-browser/common.(*Logger).Logf","goroutine":21,"level":"info","msg":"method:'\\x06'","time":"2021-11-25T14:12:04+03:00"}
{"caller":"/Users/inanc/grafana/xk6-browser/common/logger.go:124","category":"Browser:Close","elapsed":"1 ms","func":"github.com/grafana/xk6-browser/common.(*Logger).Logf","goroutine":7,"level":"info","msg":"browserProc terminate","time":"2021-11-25T14:12:04+03:00"}

It's kind of noisy. Maybe we can strip some parts of it.

@inancgumus inancgumus mentioned this pull request Nov 25, 2021
@imiric
Copy link
Contributor

imiric commented Nov 26, 2021

However, we use flags for the test runs (like the debug flag). So, we might also add a trace flag to the launch options and enable CDP messages when it's on? And replace all CDP Debugfs with Tracefs and update the logger accordingly. WDYT?

But debug is not a CLI flag, it's a JS option. I wouldn't want to have to change the script whenever I want to enable trace logs (or use workarounds like we did for XK6_HEADLESS), it should be done via the CLI. Since we can't add CLI flags or change the way --verbose works, an environment variable for this is the simplest approach, so I would go with that.

Maybe we should write a custom formatter for using our custom logger on top of Logrus? That can possibly solve the issues I explained above.

🤔 That might work, though then we're really depending even more on Logrus, which we probably want to replace.

Can you send me the patch so I can look at it?

Sure, it's just a simple change in debugLog:

diff --git a/common/helpers.go b/common/helpers.go
index ec8c20d..851e227 100644
--- a/common/helpers.go
+++ b/common/helpers.go
@@ -52,7 +52,9 @@ func debugLog(msg string, args ...interface{}) {
 		elapsed = 0
 	}
 	magenta := color.New(color.FgMagenta).SprintFunc()
-	args = []interface{}{fmt.Sprintf("[%d] %s", goRoutineID(), fmt.Sprintf(msg, args...))}
+	pc, fn, line, _ := runtime.Caller(1)
+	args = []interface{}{fmt.Sprintf("[%d %s(%s:%d)] %s",
+		goRoutineID(), runtime.FuncForPC(pc).Name(), fn, line, fmt.Sprintf(msg, args...))}
 	args = append(args, fmt.Sprintf("%s ms", magenta(elapsed)))
 	fmt.Println(args...)
 	debugLastCall = now

I wasn't aware of Logrus' SetReportCaller(), so we should probably use that.

But really, these are all improvements we can do after this PR, including proper structured logging, WithError, etc. The only blocker I see for merging this would be to log CDP messages with Tracef and enable TraceLevel if XK6_BROWSER_TRACE is defined, since we can't influence it any other way.

@inancgumus
Copy link
Member Author

inancgumus commented Nov 29, 2021

Replace all CDP Debugfs with Tracefs and update the logger accordingly. WDYT?

First todo: +1 - Tracef

But debug is not a CLI flag, it's a JS option. I wouldn't want to have to change the script whenever I want to enable trace logs (or use workarounds like we did for XK6_HEADLESS), it should be done via the CLI. Since we can't add CLI flags or change the way --verbose works, an environment variable for this is the simplest approach, so I would go with that.

2nd todo: What do you think about XK6_LOG=1 where 1 can be changed for each level (aligned with the log levels). And it will override the script's launch options when we define it.

Maybe we should write a custom formatter for using our custom logger on top of Logrus? That can possibly solve the issues I explained above.

🤔 That might work, though then we're really depending even more on Logrus, which we probably want to replace.

Let's pass this on for now, then.

I wasn't aware of Logrus' SetReportCaller(), so we should probably use that.

+1 but for later, ok.

But really, these are all improvements we can do after this PR, including proper structured logging, WithError, etc. The only blocker I see for merging this would be to log CDP messages with Tracef and enable TraceLevel if XK6_BROWSER_TRACE is defined, since we can't influence it any other way.

Please see the 2nd todo above.

@imiric

@imiric
Copy link
Contributor

imiric commented Nov 29, 2021

What do you think about XK6_LOG=1 where 1 can be changed for each level (aligned with the log levels). And it will override the script's launch options when we define it.

Sounds good (maybe XK6_BROWSER_LOG?), but we don't need to make this sophisticated, or even document it, since we wouldn't want users to become dependent on it. It would be best kept as an internal dev-only workaround until we can integrate it better with the CLI (-vvv, --log-level=<info|debug|...>, etc.).

It's only used with a logger. If we need to use this function in the
future, we can move it back.
@inancgumus inancgumus requested a review from imiric November 29, 2021 13:32
@inancgumus
Copy link
Member Author

inancgumus commented Nov 29, 2021

I added XK6_BROWSER_LOG, and changed CDP messages to Trace level. The new flag overrides the script debug option and accepts these levels: panic, fatal, error, warn, warning, info (default), debug, trace:

XK6_BROWSER_LOG=warn xk6 run -q examples/colorscheme.js

I also added another flag called XK6_BROWSER_CALLER. It prints the names of the caller file and functions when enabled:

XK6_BROWSER_CALLER=1 xk6 run -q examples/colorscheme.js

@imiric

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! I think we should eventually fix how the caller is determined/rendered, but we can do that later.

Comment on lines +136 to +148
func (l *Logger) ReportCaller() {
caller := func() func(*runtime.Frame) (string, string) {
return func(f *runtime.Frame) (function string, file string) {
return f.Func.Name(), fmt.Sprintf("%s:%d", f.File, f.Line)
}
l.lastLogCall = now
}
l.log.SetFormatter(&logrus.TextFormatter{
CallerPrettyfier: caller(),
FieldMap: logrus.FieldMap{
logrus.FieldKeyFile: "caller",
},
})
l.log.SetReportCaller(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this custom field and prettifier? From what I see it only changes the field name.

With:

func="github.com/grafana/xk6-browser/common.(*Logger).Logf" caller="github.com/grafana/xk6-browser@v0.0.0-00010101000000-000000000000/common/logger.go:116"

Without (just calling l.log.SetReportCalled(true)):

func="github.com/grafana/xk6-browser/common.(*Logger).Logf" file="github.com/grafana/xk6-browser@v0.0.0-00010101000000-000000000000/common/logger.go:116"

Is the idea to leave it for further customization (cleaning up the github.com/... prefix like you mentioned)?

Also, there are a lot of places where the caller is Logger.Logf. That might be tricky to resolve, so this is fine as is.

Copy link
Member Author

@inancgumus inancgumus Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's because we're using our custom logger. Maybe we should get rid of it as you said before.

For now, I can do something like this, WDYT?

// ReportCaller adds source file and function names to the log entries.
func (l *Logger) ReportCaller() {
	// strip the module informaiton
	strip := func(s string) string {
		s = strings.TrimPrefix(s, "github.com/grafana/xk6-browser")
		s = s[strings.Index(s, "/")+1:]

		return s
	}
	caller := func() func(*runtime.Frame) (string, string) {
		return func(f *runtime.Frame) (fn string, loc string) {
			const stackLevel = 8
			_, file, no, ok := runtime.Caller(stackLevel)
			if ok {
				fn = fmt.Sprintf("%s:%d", strip(file), no)
			}

			return fn, ""
		}
	}
	l.log.SetFormatter(&logrus.TextFormatter{
		CallerPrettyfier: caller(),
	})
	l.log.SetReportCaller(true)
}

Output:

time="2021-11-30T12:12:21+03:00" level=debug msg="sid:AFA9A388DA1617DF05688B6718BCD591 tid:045F8A2BB4D7E102A47820CF00E5C9DF" func="common/session.go:75" category="Session:close" elapsed="0 ms" goroutine=68

Maybe we should make this the default one and get rid of the XK6_BROWSER_CALLER?

@imiric

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we wouldn't need XK6_BROWSER_CALLER if the caller information could be added by default on levels >= Debug.

Can that stackLevel = 8 be a constant? I mean, will that always be the case? I don't have time to look into it now, but if you think it solves it for all cases, LGTM.

And the func="common/session.go:75" doesn't make much sense, there's no function name :) Maybe this should be caller="common.(*Session).close [common/session.go:75]"?

In any case, don't spend too much time on this. Feel free to create an issue for it and we can tackle it soon-ish.

Copy link
Member Author

@inancgumus inancgumus Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we need function name there as well, but at least it goes to the file and line when I click on it on my terminal 😅

I created another issue with some rough commits that solve some of the issues:

@imiric

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that stackLevel = 8 be a constant? I mean, will that always be the case? I don't have time to look into it now, but if you think it solves it for all cases, LGTM.

For this Logrus version, yes!

@inancgumus inancgumus merged commit 99d1ec9 into main Nov 29, 2021
@inancgumus inancgumus deleted the fix/54-proper-logger branch November 29, 2021 18:51
@inancgumus inancgumus mentioned this pull request Nov 30, 2021
5 tasks
@inancgumus inancgumus self-assigned this Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proper logger
2 participants