Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

121 logging improvements #122

Merged
merged 25 commits into from
Dec 19, 2023
Merged

121 logging improvements #122

merged 25 commits into from
Dec 19, 2023

Conversation

skx
Copy link
Owner

@skx skx commented Dec 16, 2023

This pull-request, once complete, will close #121 by improving our logging. Improving in this case means that we're very consistent with how we log messages, and their contents.

  • There will be three levels
    • DEBUG to be used for anything a developer wants to see for themselves easily, either during or after development of a new feature.
    • WARN for something a user might wish to see, but which is not a fatal error.
    • ERROR for something that causes a (near immediate) program exit.

All logging messages will be written to STDERR, and by default that will be done in "plain text", however it is possible to use JSON for ingestion to other systems, and this will be documented in the README.md file.

@skx
Copy link
Owner Author

skx commented Dec 16, 2023

So far the main CLI sub-commands report their errors appropriately but the actual application/processor does not yet take advantage of this failure.

Sample usage shows how I'm using custom fields to log things:

$ LOG_ALL=1 ./rss2email import  x
time=2023-12-16T09:27:32.030+02:00 level=DEBUG msg="Adding entry from file" file=x url=https://blog.steve.fi/index.rss
time=2023-12-16T09:27:32.030+02:00 level=DEBUG msg="Adding entry from file" file=x url=https://www.filfre.net/feed/rss/
time=2023-12-16T09:27:32.030+02:00 level=DEBUG msg="Adding entry from file" file=x url=https://briancallahan.net/blog/feed.xml
time=2023-12-16T09:27:32.030+02:00 level=DEBUG msg="Adding entry from file" file=x url=http://www.breakintoprogram.co.uk/feed

@skx
Copy link
Owner Author

skx commented Dec 16, 2023

The various sub-commands now log via the logger, and I've set things up so the processor can have a logging-handle, via SetLogger.

The final step will be to update the processor to log appropriately, ideally with per-feed context.

@skx
Copy link
Owner Author

skx commented Dec 17, 2023

Current state:

  • The main CLI commands log properly.
  • The processor which handles the feeds logs properly.

There is missing logging from the "withstate" package, and the "email sending" stuff is still basically undocumented. The HTTP-fetching code probably needs to have logging as that's a common source of issues. (Specifically I get "processing feed xxxx" then if the remote host is down I have to wait for the fetch to timeout with zero progress, and sleep-delays. Just adding attempt N of Max logs would suffice.)

I also need to go through and work out what to do with the boltdb updates - are they fatal errors or warnings?

I'm extremely pleased with the structured logging approach, especially now I get source/function notes in the output too.

skx added 2 commits December 17, 2023 12:24
This is useful for showing the retries and failures
@skx
Copy link
Owner Author

skx commented Dec 17, 2023

Changes now:

  • The -verbose flag just changes the logging level.
    • This has the benefit of recycling old behaviour and doing the right thing.
  • The HTTP-fetcher now logs a lot.
  • Test cases all continue to pass.

Outstanding tasks:

  • The boltdb stuff is still mostly undocumented, in terms of logging.
  • As is the email-sending stuff.

However in both cases errors are returned and are sane, so perhaps that is enough? Will ponder for the next few days.

@skx
Copy link
Owner Author

skx commented Dec 17, 2023

We're now in good shape:

  • Logging is used appropriately in all places i think it needs to be - though perhaps more could be added over time, or some removed.
  • Logging shows only "important" stuff by default.
  • But -verbose will show "everything".

Finally we've got only a few fmt.Printf calls remaining in our code, and they look legitimate:

$ rgrep fmt.Printf . |grep -v .git
grep: ./rss2email: binary file matches
./daemon_cmd.go:		fmt.Printf("Usage: rss2email daemon email1@example.com .. emailN@example.com\n")
./daemon_cmd.go:			fmt.Printf("Usage: rss2email daemon [flags] email1 .. emailN\n")
./cron_cmd.go:		fmt.Printf("Usage: rss2email cron email1@example.com .. emailN@example.com\n")
./cron_cmd.go:			fmt.Printf("Usage: rss2email cron [flags] email1 .. emailN\n")
./seen_cmd.go:		fmt.Printf("%s\n", buck)
./seen_cmd.go:				fmt.Printf("\t%s\n", key)
./unsee_cmd.go:		fmt.Printf("Please specify the URLs to unsee (i.e. remove from the state database).\n")

I can't think of anything to tighten here, except perhaps the handling of the errors here is a bit redundent:

                // Process all the feeds
                errors := p.ProcessFeeds(recipients)

                // If we found errors then show them.
                if len(errors) != 0 {
                        for _, err := range errors {
                                fmt.Fprintln(os.Stderr, err.Error())
                        }
                }

If there are errors now we see that twice:

{"time":"2023-12-17T18:25:23.731978815Z","level":"DEBUG","source":{"function":"rss2email/processor.(*Processor).ProcessFeeds","file":"rss2email/processor/processor.go","line":101},"msg":"about to process feeds","feed_count":34}
{"time":"2023-12-17T18:25:23.732158375Z","level":"DEBUG","source":{"function":"rss2email/processor.(*Processor).ProcessFeeds","file":"rss2email/processor/processor.go","line":107},"msg":"starting to process feed","feed":"https://rachelbythebay.com/w/atom.xml"}
{"time":"2023-12-17T18:25:23.762472328Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":124},"msg":"fetching URL","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"httpfetch":{"link":"https://rachelbythebay.com/w/atom.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"attempt":1}
{"time":"2023-12-17T18:25:53.783143722Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":134},"msg":"fetching URL failed","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"httpfetch":{"link":"https://rachelbythebay.com/w/atom.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"error":"Get \"https://rachelbythebay.com/w/atom.xml\": dial tcp 216.218.228.215:443: i/o timeout"}
{"time":"2023-12-17T18:25:54.7833987Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":124},"msg":"fetching URL","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"httpfetch":{"link":"https://rachelbythebay.com/w/atom.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"attempt":2}
{"time":"2023-12-17T18:26:24.812271984Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":134},"msg":"fetching URL failed","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"httpfetch":{"link":"https://rachelbythebay.com/w/atom.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"error":"Get \"https://rachelbythebay.com/w/atom.xml\": dial tcp 216.218.228.215:443: i/o timeout"}
{"time":"2023-12-17T18:26:25.812461705Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":124},"msg":"fetching URL","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"httpfetch":{"link":"https://rachelbythebay.com/w/atom.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"attempt":3}
{"time":"2023-12-17T18:26:55.836623461Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":134},"msg":"fetching URL failed","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"httpfetch":{"link":"https://rachelbythebay.com/w/atom.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"error":"Get \"https://rachelbythebay.com/w/atom.xml\": dial tcp 216.218.228.215:443: i/o timeout"}
{"time":"2023-12-17T18:26:56.837833238Z","level":"WARN","source":{"function":"rss2email/processor.(*Processor).processFeed","file":"rss2email/processor/processor.go","line":279},"msg":"failed to fetch feed","feed":{"link":"https://rachelbythebay.com/w/atom.xml"},"error":"Get \"https://rachelbythebay.com/w/atom.xml\": dial tcp 216.218.228.215:443: i/o timeout"}

Here you see a lot of noise, but summarizing:

  • We're going to process 34 feeds.
  • The first feed is rachelbythebay.com
  • Fetching that feed fails, and retries happen.
  • The last line shows "WARN"
    • So this would be visible by default. (WARN/ERROR are shown by default, DEBUG requires extra effort or a -verbose flag to view).

At the end of the run we see the list of errors, which in this case just shows the same thing:

..
..
{"time":"2023-12-17T18:27:26.504908829Z","level":"DEBUG","source":{"function":"rss2email/httpfetch.(*HTTPFetch).Fetch","file":"rss2email/httpfetch/httpfetch.go","line":124},"msg":"fetching URL","feed":{"link":"https://www.leadedsolder.com/feed.xml"},"httpfetch":{"link":"https://www.leadedsolder.com/feed.xml","user-agent":"rss2email (https://github.com/skx/rss2email)","insecure":false,"retry-max":3,"retry-delay":1},"attempt":1}
{"time":"2023-12-17T18:27:27.727671007Z","level":"DEBUG","source":{"function":"rss2email/processor.(*Processor).processFeed","file":"rss2email/processor/processor.go","line":285},"msg":"feed retrieved","feed":{"link":"https://www.leadedsolder.com/feed.xml"},"entries":10}
{"time":"2023-12-17T18:27:27.843660767Z","level":"DEBUG","source":{"function":"rss2email/processor.(*Processor).processFeed","file":"rss2email/processor/processor.go","line":439},"msg":"feed processed","feed":{"link":"https://www.leadedsolder.com/feed.xml"},"seen_count":10,"unseen_count":0}
{"time":"2023-12-17T18:27:27.843843979Z","level":"DEBUG","source":{"function":"rss2email/processor.(*Processor).ProcessFeeds","file":"rss2email/processor/processor.go","line":244},"msg":"all feeds processed","feed_count":34}
error processing https://rachelbythebay.com/w/atom.xml - Get "https://rachelbythebay.com/w/atom.xml": dial tcp 216.218.228.215:443: i/o timeout

@skx skx merged commit 3166c16 into master Dec 19, 2023
3 checks passed
@skx skx deleted the 121-logging branch December 19, 2023 18:46
skx added a commit that referenced this pull request Dec 30, 2023
In #122 we updated our codebase to ensure that we used a common
logging handle to send our output messages to the user:

* This included warnings and errors by default.
* However "developer", or other internal messages, were also available.

This pull-request builds upon that work to write the log messages to
a file, as well as showing them to the user.  By default a file
`rss2email.log` is generated when the application starts and log
entries are appended to it.

However:

* LOG_FILE_PATH can be set to a different path.
* LOG_FILE_DISABLE can be used to disable this duplication.

Why do this?  Partly for reference, but also partly to allow a local
setup to view errors.

We could now resolve #119 by adding a cronjob:

        #!/bin/sh
        # show logs, and delete them, run once per day
        cat .../rss2email.log
        rm .../rss2email.log

(Of course this might not work 100% as the current approach assumes
the file is open forever in the case of `daemon` sub-command.)

This isn't a complete solution, but without getting into the whole
template-customization and more complex considerations it's not a
terrible thing to do.
@skx skx mentioned this pull request Dec 30, 2023
skx added a commit that referenced this pull request Jan 4, 2024
In #122 we updated our codebase to ensure that we used a common
logging handle to send our output messages to the user:

* This included warnings and errors by default.
* However "developer", or other internal messages, were also available.

This pull-request builds upon that work to write the log messages to
a file, as well as showing them to the user.  By default a file
`rss2email.log` is generated when the application starts and log
entries are appended to it.

However:

* LOG_FILE_PATH can be set to a different path.
* LOG_FILE_DISABLE can be used to disable this duplication.

Why do this?  Partly for reference, but also partly to allow a local
setup to view errors.

We could now resolve #119 by adding a cronjob:

        #!/bin/sh
        # show logs, and delete them, run once per day
        cat .../rss2email.log
        rm .../rss2email.log

(Of course this might not work 100% as the current approach assumes
the file is open forever in the case of `daemon` sub-command.)

This isn't a complete solution, but without getting into the whole
template-customization and more complex considerations it's not a
terrible thing to do.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve logging
1 participant