Replies: 13 comments 3 replies
-
I second this we should have some sort of TimeRotating log file |
Beta Was this translation helpful? Give feedback.
-
I'm ashamed to admit that logging has always been something of an afterthought for me, even though I often rely on it heavily afterwards. My knowledge of what's useful/what's not so useful has also been cobbled together from a bunch of (possibly inconsistent) sources. Thank you for proposing this, please do track the knowledge that you have/learn here or in an issue so that we (me!) can finally have a more 'structured' approach to logging. (See what I did there 😉). In terms of things that seem useful to me at the moment:
That's all I can think of for now. Looking forward to tracking this! |
Beta Was this translation helpful? Give feedback.
-
Another key issue, and which will be critical for the end-user, is redirecting logs from the dask workers, which basically "run-blind" to somewhere, where the logs can be viewed and monitored in real-time. Currently we're using MongoDB to do this, redirecting the logs in real time to a collection. See here. |
Beta Was this translation helpful? Give feedback.
-
@nen, your four points are sufficiently strong that I did a naughty and edited your comment to give them numbers! I totally missed 2, particularly, which is so important. What's the point of structured logging without a consistent fields, huh? Clearly a document for developers is a key deliverable, it's part of 1, 2, 3, and 4. We get 4 almost for free if we just include file and line numbers in each log record. 1 and 3 are the same huge point - "don't spam the logs"! I spent considerable time on Google logging, which is amazing. Considerably later, I spent a few months doing it in another project, where the main program generated 2 gigs of log every day, only one person could really read them, and yet no one wanted to commit to removing any log messages. /shaking my head Google and most companies at the time they started had two sorts of logs: free-form human-readable logs of everything that gets printed to stderr or stdout, and very structured logs written using protocol buffers. The two were totally different in every way: functionality, API, and managerial. (You had to get serious permissions to add something to the structured logs, but none to write crap into the programmer logs.) Google considered their structured logging to be so critical that they got two famous figures from computer science to work on them, Peter J. Weinberger and Rob Pike. (I spent a lot of time with Peter and he's a bit of a curmudgeon in a good way but ego-free and unpretentious. My only interaction with Pike was a disagreement about error handling in Go in the early phases, he can be a bit spikey, and I still think I was right. :-D) Nearly all the value in Google comes from their targeted ads: nearly all of that comes from structured log analysis. It is likely we can get away with only one type of log because we are starting fresh. In order to do this, we will have to hijack stdout and stderr from our dependencies, as well as their unstructured "classic logging" calls, and wrap them in structured logging calls. Redirecting "classic logging" is known technology, we can do it mechanically with only a little look at the source of our dependencies. Redirecting "stdout and stderr" for dependencies is always doable, but might take a bit of research into each dependency to resolve that. |
Beta Was this translation helpful? Give feedback.
-
I wanted to add some more important random points before I forget them:
|
Beta Was this translation helpful? Give feedback.
-
For single-node deployments, the custom logging approach is ok. For large-scale and production-ready deployments we need more advanced methods.
I would propose integration with Prometheus[1] for metrics, and with Loki[2] for logs. For tools are optimized for running queries, provide web interfaces, and handle nuances such as rotation, replication, etc. [1] https://github.com/prometheus/client_python |
Beta Was this translation helpful? Give feedback.
-
The term that I have heard used before is "monitoring variables". I haven't heard the term "metrics" before, and I'm moderately against it, as it has so many other meanings, and we already have
welll.... Google became fabulously wealthy because of log analysis. Spell correction? From log analysis. Targeted ads? From log analysis. There are actually two somewhat separate uses for logs and Google had completely different "logs" in fact, stored and processed in radically different ways. There were "programmer's logs" which were basically print statements to a UTF-8 text file separated by carriage returns. At Google, there were three of these logs, depending on error levels (info, warn, error). Some automatic monitoring of this was done - for example, new error messages appearing would trigger stuff. No personally identifiable information of customers could go into these logs. And then there were "the logs", basically event logs, which were highly structured, stored in protocol buffer format, encrypted, and held behind "bastion" machines where only a handful of people had keys. Analysis could only be done through a special purpose language which did not actually allow you to retrieve personally identifiable information. Google took this seriously enough that they had two famous people, Peter Weinberger and R. Pike, design the logs system. The GDPR and personally identifiable information program is so thorny that I think we shouldn't deal with it at all! :-) I think we should tell people somewhere that all the logs are only intended to be collected on people working with your company, and not from the general public at all and if they do, they won't be GDPR compliant, and we aren't responsible. I don't think we should have separate programmers' and event logs, because it's too much work. I think we should write everything to structured logs. I think we should keep the structure logging as simple and minimal as possible. I also think we should have unit testing of it, though: when we add it, we should add at the same time a way that a unit test can easily say, "Now test that the previous steps generated logs rather like this". I would like to add that we are in a fairly easy and luxurious place, because our individual operations are very heavy, and there aren't very many of them. In some systems, the efficiency of the logs is of key importance, because generating them is quite heavy compared to the tininess of each transaction. Given that, we could get even more simplified: we could have our monitoring variables simply as part of our logging, where we just emit all sorts of variables to our logging system, and then a separate monitoring system exposes some subset of those variables to the outside world to be monitored. I am talking of course about our internal API. We should be entirely be using other people's code as much as possible for log gathering, rotation, compression, etc etc etc. but we want to make our own internal API for logging/monitoring as bone simple as possible, and allow other people to use it when writing superduperdb programs. |
Beta Was this translation helpful? Give feedback.
-
So let's look into what such an API might be like. I see a logging or monitoring event as emitting a JSON record to a key, so you get a "timeseries" of JSON records for each key. Sounds easy, but it opens a couple of cans of worms! If I write a record to a key today, I would like at any point in the future to be able to find that key, and then understand that record. This means keys have to be stable. It's probably not a good idea to have user-contributed information in a key, therefore. This also means that it must be possible to identify the type of a record from a key. Nearly always, keys are segmented, divided into multiple segments with semantic meaning like One or more segments will have to determine the type. (In other APIs, the "type" might be at the start, e.g. Types of records need to be carefully controlled in order to maintain backward and forward compatibility. The meetings of fields cannot change. If you got the design of a field wrong, you simply have to create a new field with a new name, and still accept the old one in your analysis code, which means that you also need to commit some code to port existing records to the new format if that has to happen - but you never rewrite old logs. Finally, you need to have at least some logic associated with the log type. For example, you want "counter" types where just hitting them increments a thread-safe counter, but you want other types where you set a new value each time, so this means there will have to be a class associated with each log key (from a small number: "set", "counter", "mean" perhaps...) So let's actually fix all keys at startup, as static variables - then we could write unit tests about them. Let's sketch the design of just a counter that increments by one when it is hit, something that takes successive numerical samples, and a string variable. But we can't actually instantiate a log until we are running, we don't want a huge number of global static variables everywhere that we need to patch to make our tests run! So it goes like this:
|
Beta Was this translation helpful? Give feedback.
-
Using this strategy allows us to ensure some of the important properties above with unit tests. We create a text "registry" of every single log field and type and we store it with the source (not even with the tests, it's part of our spec). A test makes sure that the existing fields do not change incorrectly as follows. Each class inheriting from One of three outcomes might happen:
In case 1, the test passes. In case 2, the test fails but with a message like this:
In case 3, it fails with no automatic fix. For simplicity, I omitted one important case in the "log types above", which is the structured type, but I know you're itching to see it, so let me list it now.
|
Beta Was this translation helpful? Give feedback.
-
Frequently imagined questions! 1. How to we incorporate third-party and legacy text logs into this picture?We receive a log record from Simple: each third party log gets its own key. If we can't at all parse it, we can just store it as a string of text, or we might be able to extract some structure from it. 2. Who's actually writing the logs and where?Under the scenes we will use some other library, probably 3. Hasn't someone already done this?Only the ideas in the above code snippets are new and will need coding. Most of this is other people's production packages But you would think someone had done this little idea before, it seems both rigorous and convenient. I'm going to look again after I press return, but I did a search for "Python structured logging" before and it showed me a lot of conventional packages, none of which used modern techniques of using class members to indicate intent, like SQLAlchemy, dataclasses or pydantic do. The code will be advanced, but straightforward to write and not require many keystrokes to achieve. There won't be a lot of special cases or complex logic, either. 4. But I really really really want print statements/unstructured logs!No problems! Just for you, we could create two keys named 5. What about monitoring variables/metrics?In this proposal, there is no difference in the API. Behind the scenes, we can either route certain keys also to send to monitoring systems (active monitoring) or respond to a variable request from a monitor (passive monitoring), but conceptually these are all the same sort of thing, a key attached to a type that receives a series of values! |
Beta Was this translation helpful? Give feedback.
-
These guys seem to be the most prominent in Python purely structured logging: https://github.com/hynek/structlog and actually, we could avoid They integrate with a lot of things we would want to integrate with. They don't have the idea of stable keys validated by tests like the above, which is I think essential if you really intend to read last year's logs next year, a goal which is part of the "repeatable calculation" effort. They don't have integrated monitoring.
They also seem to interface with a laundry list of other programs. They don't have the idea of stable keys validated by tests. So I tentatively think we should write to We provide the simple framework designed above. I could write implementations of what's above in three days with little risk factor as it has few dependencies. Behind the scenes it's just a bunch of keys, each receiving a time series of JSON-able values with backward compatible structures. We send keys and data to |
Beta Was this translation helpful? Give feedback.
-
One final part since I'm here! Let's estimate the software parts needed to implement it! I should always do this, takes a few minutes.
|
Beta Was this translation helpful? Give feedback.
-
Can we evaluate SigNoz as our logging platform? It's open source with 15K Stars. It has a community edition as well. Big companies like Comcast are using it. It's cheaper than most other alternatives. Most importantly, it's open source. Here's the pricing details. To get an idea regarding pricing, see here: Here's the Python example On the other hand, we can also utilize Loki. It has 21K stars on GitHub. |
Beta Was this translation helpful? Give feedback.
-
Why?
Our current logging is hand-written and very low on features, as in "none".
I believe that end-users, internal and external developers will all want more features.
Sources of logging
Possible features
It is easy to suggest features, harder to estimate their value to the user, and perhaps even harder to estimate how much work they are. Here's a list in very roughly value from most to least.
Likely non-goals and non-features
Notes on features
The last feature, prettiness has a non-zero value, and not just to impress others: we will be seeing a lot of these logfiles!, but we should decide which logging system to use based on the other features, and then use whatever prettiness we get. :-)
As for "ease of programmer use", there seem to be only two styles:
Import/create:
Import-only
I should add that the second format still manages to get
__name__
and even the line in the file that's being called, usinginspect
magic and some clever caching, in seemingly every library that offers it.A good reference
This article is actually pretty good when it comes to the two libraries I know well, Python's logging and loguru.
My next steps will be to go through the remaining three libraries in the article and any others I find and compare them with my checklist above.
IIRC, I loved
loguru
totally except we had a bad time with integration with the Python standard logger which would be a dealbreaker, but this was several years ago.Beta Was this translation helpful? Give feedback.
All reactions