Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Change the default log config to reduce disk I/O and storage #8040

Merged
merged 11 commits into from
Aug 11, 2020

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Aug 6, 2020

See the comments in the config.

Fixes #8018

Note: I'm happy to bikeshed some of the values here.

This batches up writes to the filesystem, which is more efficient for
disk I/O. This means that it can take some time for logs to get written
to disk. Note that ERROR logs (and above) immediately flush the buffer.

This only effects new installs, as we only write the log config if
started with `--generate-config` (in the same way we do for generating
signing keys).
This hopefully reduces the amount of logs kept for new servers. Keeping
the last 1GB of logs is likely overkill for new servers, but equally may
not be enough for busy ones.

Instead, we keep the last four days worth of logs, enough so that admins
can investigate any problems that happened over e.g. a long weekend.
@erikjohnston erikjohnston requested a review from a team August 6, 2020 16:05
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's another footgun here, which is #3471.

Basically if there is an error writing to the logfile, then that error gets written back at logger.error, which makes the MemoryLogger try to flush again, so you stackoverflow and everything is awful.

I've long argued that redirecting stderr to the logging system is a stupid idea, but it doesn't seem to want to die, so in its absence the options seem to be (a) reimplement MemoryLogger to avoid the infinite loop (b) special-case the twisted logger so that it goes straight to file.

we went with (b) on matrix.org.

docs/sample_log_config.yaml Show resolved Hide resolved
docs/sample_log_config.yaml Outdated Show resolved Hide resolved
docs/sample_log_config.yaml Outdated Show resolved Hide resolved
docs/sample_log_config.yaml Outdated Show resolved Hide resolved
We do this to prevent foot guns. The default config uses a MemoryFilter,
but users are free to change to logging to files directly. If they do
then they have to ensure to set the `filters: [context]` on the right
handler, otherwise records get written with the wrong context.

Instead we move the logic to happen when we generate a record, which is
when we *log* rather than *handle*.

(It's possible to add filters to loggers in the config, however they
don't apply to descendant loggers and so they have to be manually set on
*every* logger used in the code base)
@erikjohnston erikjohnston force-pushed the erikj/default_logging_config branch from 1320840 to c2fc81e Compare August 7, 2020 10:45
buffer:
class: logging.handlers.MemoryHandler
target: file
capacity: 100
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of wonder if 100 is a bit extreme here for a new server?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in, too high? maybe a bit.

probably worth adding a comment here saying what the number means, so people can tweak it if they want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, even jki.re logs at less than 30 lines a second so a brand new server would probably not actually flush very often at all, which I can imagine will cause confusion. I'm somewhat tempted to lower it to something like ten, which would still be a huge improvement.

We don't WARN too frequently, so this is fine and ensures that we log
things like "STARTING" immediately.
@erikjohnston
Copy link
Member Author

I've also changed the default to flush on WARNING too, so that we actually log things like STARTING immediately

@erikjohnston erikjohnston requested a review from richvdh August 7, 2020 10:52
docs/sample_log_config.yaml Show resolved Hide resolved
@@ -55,34 +55,43 @@
format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw: remember that the docker image has its own default log config; please update it to keep it in sync.

docs/sample_log_config.yaml Show resolved Hide resolved
buffer:
class: logging.handlers.MemoryHandler
target: file
capacity: 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in, too high? maybe a bit.

probably worth adding a comment here saying what the number means, so people can tweak it if they want.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, there's more

docs/sample_log_config.yaml Show resolved Hide resolved
@erikjohnston erikjohnston requested a review from richvdh August 10, 2020 12:57
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo more doc nitpicking

Comment on lines 65 to 66
# By default buffer logs in memory before logging to file. Replace "buffer"
# with "console" to log to stderr instead.
Copy link
Member

@richvdh richvdh Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# By default buffer logs in memory before logging to file. Replace "buffer"
# with "console" to log to stderr instead.
# Write logs to the `buffer` handler, which will buffer them together in memory,
# then write them to a file.
#
# Replace "buffer" with "console" to log to stderr instead. (Note that you'll
# also need to update the configuation for the `twisted` logger above, in
# this case.)
#

@richvdh
Copy link
Member

richvdh commented Aug 11, 2020

oh also: you still haven't updated the log conf for the docker image.

@erikjohnston
Copy link
Member Author

oh also: you still haven't updated the log conf for the docker image.

As per #synapse-dev:matrix.org the docker config logs directly to stderr, so there's nothing here to port across

@erikjohnston erikjohnston merged commit db131b6 into develop Aug 11, 2020
@erikjohnston erikjohnston deleted the erikj/default_logging_config branch August 11, 2020 17:09
reivilibre added a commit that referenced this pull request Aug 13, 2020
Synapse 1.19.0rc1 (2020-08-13)
==============================

Removal warning
---------------

As outlined in the [previous release](https://github.com/matrix-org/synapse/releases/tag/v1.18.0), we are no longer publishing Docker images with the `-py3` tag suffix. On top of that, we have also removed the `latest-py3` tag. Please see [the announcement in the upgrade notes for 1.18.0](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180).

Features
--------

- Add option to allow server admins to join rooms which fail complexity checks. Contributed by @lugino-emeritus. ([\#7902](#7902))
- Add an option to purge room or not with delete room admin endpoint (`POST /_synapse/admin/v1/rooms/<room_id>/delete`). Contributed by @dklimpel. ([\#7964](#7964))
- Add rate limiting to users joining rooms. ([\#8008](#8008))
- Add a `/health` endpoint to every configured HTTP listener that can be used as a health check endpoint by load balancers. ([\#8048](#8048))
- Allow login to be blocked based on the values of SAML attributes. ([\#8052](#8052))
- Allow guest access to the `GET /_matrix/client/r0/rooms/{room_id}/members` endpoint, according to MSC2689. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#7314](#7314))

Bugfixes
--------

- Fix a bug introduced in Synapse v1.7.2 which caused inaccurate membership counts in the room directory. ([\#7977](#7977))
- Fix a long standing bug: 'Duplicate key value violates unique constraint "event_relations_id"' when message retention is configured. ([\#7978](#7978))
- Fix "no create event in auth events" when trying to reject invitation after inviter leaves. Bug introduced in Synapse v1.10.0. ([\#7980](#7980))
- Fix various comments and minor discrepencies in server notices code. ([\#7996](#7996))
- Fix a long standing bug where HTTP HEAD requests resulted in a 400 error. ([\#7999](#7999))
- Fix a long-standing bug which caused two copies of some log lines to be written when synctl was used along with a MemoryHandler logger. ([\#8011](#8011), [\#8012](#8012))

Updates to the Docker image
---------------------------

- We no longer publish Docker images with the `-py3` tag suffix, as [announced in the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180). ([\#8056](#8056))

Improved Documentation
----------------------

- Document how to set up a client .well-known file and fix several pieces of outdated documentation. ([\#7899](#7899))
- Improve workers docs. ([\#7990](#7990), [\#8000](#8000))
- Fix typo in `docs/workers.md`. ([\#7992](#7992))
- Add documentation for how to undo a room shutdown. ([\#7998](#7998), [\#8010](#8010))

Internal Changes
----------------

- Reduce the amount of whitespace in JSON stored and sent in responses. Contributed by David Vo. ([\#7372](#7372))
- Switch to the JSON implementation from the standard library and bump the minimum version of the canonicaljson library to 1.2.0. ([\#7936](#7936), [\#7979](#7979))
- Convert various parts of the codebase to async/await. ([\#7947](#7947), [\#7948](#7948), [\#7949](#7949), [\#7951](#7951), [\#7963](#7963), [\#7973](#7973), [\#7975](#7975), [\#7976](#7976), [\#7981](#7981), [\#7987](#7987), [\#7989](#7989), [\#8003](#8003), [\#8014](#8014), [\#8016](#8016), [\#8027](#8027), [\#8031](#8031), [\#8032](#8032), [\#8035](#8035), [\#8042](#8042), [\#8044](#8044), [\#8045](#8045), [\#8061](#8061), [\#8062](#8062), [\#8063](#8063), [\#8066](#8066), [\#8069](#8069), [\#8070](#8070))
- Move some database-related log lines from the default logger to the database/transaction loggers. ([\#7952](#7952))
- Add a script to detect source code files using non-unix line terminators. ([\#7965](#7965), [\#7970](#7970))
- Log the SAML session ID during creation. ([\#7971](#7971))
- Implement new experimental push rules for some users. ([\#7997](#7997))
- Remove redundant and unreliable signature check for v1 Identity Service lookup responses. ([\#8001](#8001))
- Improve the performance of the register endpoint. ([\#8009](#8009))
- Reduce less useful output in the newsfragment CI step. Add a link to the changelog section of the contributing guide on error. ([\#8024](#8024))
- Rename storage layer objects to be more sensible. ([\#8033](#8033))
- Change the default log config to reduce disk I/O and storage for new servers. ([\#8040](#8040))
- Add an assertion on `prev_events` in `create_new_client_event`. ([\#8041](#8041))
- Add a comment to `ServerContextFactory` about the use of `SSLv23_METHOD`. ([\#8043](#8043))
- Log `OPTIONS` requests at `DEBUG` rather than `INFO` level to reduce amount logged at `INFO`. ([\#8049](#8049))
- Reduce amount of outbound request logging at `INFO` level. ([\#8050](#8050))
- It is no longer necessary to explicitly define `filters` in the logging configuration. (Continuing to do so is redundant but harmless.) ([\#8051](#8051))
- Add and improve type hints. ([\#8058](#8058), [\#8064](#8064), [\#8060](#8060), [\#8067](#8067))
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit 'db131b6b2':
  Change the default log config to reduce disk I/O and storage (#8040)
  Implement login blocking based on SAML attributes (#8052)
  Add an assertion on prev_events in create_new_client_event (#8041)
  Typo
  Lint
  why mypy why
  Lint
  Incorporate review
  Incorporate review
  Fix PUT /pushrules to use the right rule IDs
  Back out the database hack and replace it with a temporary config setting
  Fix cache name
  Fix cache invalidation calls
  Lint
  Changelog
  Implement new experimental push rules with a database hack to enable them
richvdh added a commit that referenced this pull request Apr 29, 2022
This was originally added when we first added a `MemoryHandler` to the default
log config back in #8040, to ensure
that we didn't explode with an infinite loop if there was an error formatting
the logs.

Since then, we made additional improvements to logging which make this
workaround redundant. In particular:

 * we no longer attempt to log un-UTF8-decodable byte sequences, which were the
   most likely cause of an error in the first place.

 * #8268 ensures that in the unlikely
   case that there *is* an error, it won't cause an infinite loop.
richvdh added a commit that referenced this pull request Apr 29, 2022
This was originally added when we first added a `MemoryHandler` to the default
log config back in #8040, to ensure
that we didn't explode with an infinite loop if there was an error formatting
the logs.

Since then, we made additional improvements to logging which make this
workaround redundant. In particular:

 * we no longer attempt to log un-UTF8-decodable byte sequences, which were the
   most likely cause of an error in the first place.

 * #8268 ensures that in the unlikely
   case that there *is* an error, it won't cause an infinite loop.
anoadragon453 added a commit that referenced this pull request May 22, 2023
The now-removed comment was introduced in
#8040, and referenced
a "twisted" logging handler. That handler has since been removed
from the file, so the associated comment is no longer necessary.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change default logging to use buffered logging
2 participants