Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize logging in the rpc server #1704

Closed
drewbanin opened this issue Aug 27, 2019 · 0 comments · Fixed by #1715
Closed

Rationalize logging in the rpc server #1704

drewbanin opened this issue Aug 27, 2019 · 0 comments · Fixed by #1715
Labels
enhancement New feature or request rpc Issues related to dbt's RPC server

Comments

@drewbanin
Copy link
Contributor

See also #1237

Presently, logging in the rpc server is a mixed bag:

  • some logs make it into stdout on the server, or the logs/dbt.log file, or the server response, or some combination thereof
  • some logs are structured in nature while others are not
  • If the /run endpoint is queried, the table response key is logged to stdout, the logs/dbt.log file, and the server response

I made some recommendations below, but there's really a whole lot to account for here! The guiding principles are:

  • Ensure that the stdout logs from the rpc server contain enough information to inform the
    a user with access to stdout what's happening in real-ish time
    • We don't want to wait until the request is complete to show in-progress log information
  • Make the stdout logs machine-readable so that they can be parsed and processed by an external process
  • Make it possible to correlate log lines with an original request
  • This will help with debugging from the logs
  • This will help filter the logs relevant to a given request
  • Limit the amount of request/response data we log to stdout. We want to record logs, not data

Recommendations:

  • All logging to stdout for the rpc server should be "structured" in nature
  • The logs/dbt.log file and the stdout logs for the rpc server should be equivalent (this may already be the case)
  • Compilation warnings should be surfaced in the /status endpoint if they occur during recompilation or server startup
  • dbt should return all of the logs that are recorded during startup/recompilation in the /status endpoint
  • The full request and response body should not be printed to stdout or logs/dbt.log - we can instead log a subset of this information
  • Request: I think just the path and method are appropriate, but we can be less restrictive if it would be helpful
  • Response: dbt should log the logs key in the Response object to stdout and logs/dbt.log as those log lines happen. We'll need to add some sort of identifier (the request id?) to the log handler to make sure that we can correlate log lines back to an original request. The status code should be logged in the Response log line.

Scoping
There is a lot of work to do to make dbt's internal stdout/ui printing more conducive to structured logging. That's going to be its own project, and we should do it, but we should do it in a separate PR.

A vignette of how I think this could look is provided below.

Before:

WARNING: Test 'test.my_new_package.unique_nope_id' (models/schema.yml) depends on model 'nope' which was not found
WARNING: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.debug

Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources

13:30:13 | Concurrency: 8 threads (target='dev')
13:30:13 |
13:30:14 | Done.
Received request (<Request 'http://localhost:8580/jsonrpc/' [POST]>) from 127.0.0.1, data=b'{"jsonrpc": "2.0", "method": "run", "id": 1, "params": {"timeout": null, "sql": "c2VsZWN0IDEgYXMgaWQ=", "name": 'foo'}}'
Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources, 1 None
sending response (<Response 2476 bytes [200 OK]>) to 127.0.0.1, data={'result': {'raw_sql': 'select 1 as id', 'compiled_sql': 'select 1 as id', 'node': {'raw_sql': 'select 1 as id', 'database': 'analytics', 'schema': 'dev', 'fqn': ['my_new_package', 'rpc', 'foo'], 'unique_id': 'rpc.my_new_package.foo', 'package_name': 'my_new_package', 'root_path': '/Users/drew/fishtown/clients/debug', 'path': 'rpc/foo', 'original_file_path': 'from remote system', 'name': 'foo', 'resource_type': 'rpc', 'alias': 'drew_demo_schema_foo', 'config': {'enabled': True, 'materialized': 'view', 'persist_docs': {}, 'post-hook': [], 'pre-hook': [], 'vars': {}, 'quoting': {}, 'column_types': {}, 'tags': []}, 'tags': [], 'refs': [], 'sources': [], 'depends_on': {'nodes': [], 'macros': []}, 'docrefs': [], 'description': '', 'columns': {}, 'compiled': True, 'compiled_sql': 'select 1 as id', 'extra_ctes_injected': True, 'extra_ctes': [], 'injected_sql': 'select 1 as id', 'wrapped_sql': 'None'}, 'timing': [{'name': 'compile', 'started_at': '2019-08-27T17:33:12.276822Z', 'completed_at': '2019-08-27T17:33:12.295340Z'}, {'name': 'execute', 'started_at': '2019-08-27T17:33:12.295359Z', 'completed_at': '2019-08-27T17:33:12.299549Z'}], 'table': {'column_names': ['id'], 'rows': [[1.0]]}, 'logs': [{'message': 'Acquiring new postgres connection 'foo'.', 'timestamp': '2019-08-27 13:33:12,083', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Opening a new connection, currently in state init', 'timestamp': '2019-08-27 13:33:12,084', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources, 1 None', 'timestamp': '2019-08-27 13:33:12,269', 'levelname': 'NOTICE', 'level': 15}, {'message': 'Acquiring new postgres connection 'foo'.', 'timestamp': '2019-08-27 13:33:12,270', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Opening a new connection, currently in state init', 'timestamp': '2019-08-27 13:33:12,271', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Compiling rpc.my_new_package.foo', 'timestamp': '2019-08-27 13:33:12,276', 'levelname': 'DEBUG', 'level': 10}, {'message': 'Using postgres connection 'foo'.', 'timestamp': '2019-08-27 13:33:12,295', 'levelname': 'DEBUG', 'level': 10}, {'message': 'On foo: select 1 as id', 'timestamp': '2019-08-27 13:33:12,296', 'levelname': 'DEBUG', 'level': 10}, {'message': 'SQL status: SELECT 1 in 0.00 seconds', 'timestamp': '2019-08-27 13:33:12,297', 'levelname': 'DEBUG', 'level': 10}]}, 'id': 1, 'jsonrpc': '2.0'}

After:

{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "WARNING: Test 'test.my_new_package.unique_nope_id' (models/schema.yml) depends on model 'nope' which was not found"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "WARNING: Configuration paths exist in your dbt_project.yml file which do not apply to any resources."}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "There are 1 unused configuration paths:"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": " - models.debug"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": ""}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "13:30:13 | Concurrency: 8 threads (target='dev')"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "13:30:13 |"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "13:30:14 | Done."}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "POST http://localhost:8580/jsonrpc/ from 127.0.0.1", "method": "run", "request_id": 1}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources, 1 None"}
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": ""}
{"context": "request", "request_id": 1, "message": "Acquiring new postgres connection 'foo'.", "timestamp": "2019-08-27 13:33:12,083", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "Opening a new connection, currently in state init", "timestamp": "2019-08-27 13:33:12,084", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "Found 1 model, 6 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 2 seed files, 0 sources, 1 None", "timestamp": "2019-08-27 13:33:12,269", "levelname": "NOTICE", "level": 15}
{"context": "request", "request_id": 1, "message": "Acquiring new postgres connection 'foo'.", "timestamp": "2019-08-27 13:33:12,270", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "Opening a new connection, currently in state init", "timestamp": "2019-08-27 13:33:12,271", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "Compiling rpc.my_new_package.foo", "timestamp": "2019-08-27 13:33:12,276", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "Using postgres connection 'foo'.", "timestamp": "2019-08-27 13:33:12,295", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "On foo: select 1 as id", "timestamp": "2019-08-27 13:33:12,296", "levelname": "DEBUG", "level": 10}
{"context": "request", "request_id": 1, "message": "SQL status: SELECT 1 in 0.00 seconds", "timestamp": "2019-08-27 13:33:12,297", "levelname": "DEBUG", "level": 10}]
{"context": "server", "levelname": "xxx", "timestamp": "yyy", "level": "xxx", "message": "Response 2476 bytes to 127.0.0.1", "request_id": 1, "response_code": 200}
@drewbanin drewbanin added enhancement New feature or request rpc Issues related to dbt's RPC server labels Aug 27, 2019
@drewbanin drewbanin added this to the Louisa May Alcott milestone Aug 27, 2019
beckjake added a commit that referenced this issue Aug 30, 2019
…ctured-rpc-logging

Feature/logbook structured rpc logging (#1703, #1704)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request rpc Issues related to dbt's RPC server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant