Skip to content

Commit c6f9773

Browse files
committed
Improve logging and diagnostic information
* Within worker, add logs when a message is received and immediatly after a response has been sent. * Include HOSTNAME in the test metadata that is stored in Redis. For a K8s deployment, HOSTNAME will correspond to the name of the pod running the test. This will make it possible to retreive worker logs for a misbehaving test. * When a test is complete, the worker and the associated logs, may no longer exist, however the logs will still be available in the log file contained within the test payload that is pushed to long term storage. * These changes pertain to worker, however there is an existing log message within the web implementation that logs when a message is sent to the worker, but no response is received. Additionally, each message between web and worker is given a unique ID, therefore with all of this togethor there will be breadcrumbs if a message is dropped.
1 parent 3089ae4 commit c6f9773

File tree

3 files changed

+36
-16
lines changed

3 files changed

+36
-16
lines changed

docs/redis.md

+10-9
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,24 @@
33
BOPTEST-Service currently uses Redis for two purposes.
44

55
1. Storing test metadata.
6-
2. A pub/sub message bus to communicate with workers.
6+
2. A pub/sub message bus to communicate with workers.
77

88
## Test Metadata:
99

1010
A hash under the key `tests:${testid}` is used to store test metadata.
1111

12-
| Key | Field | Value
13-
| ---------------------------------------- | ----------- | -----------------------------
14-
| `tests:${testid}` | `status` | `Queued \| Running`
15-
| `tests:${testid}` | `timestamp` | Epoch seconds when test is queued
16-
| `tests:${testid}` | `user` | The OAuth user "sub", or undefined for anonymous tests
12+
| Key | Field | Value |
13+
| ----------------- | ----------- | ---------------------------------------------------------------------------------------- |
14+
| `tests:${testid}` | `status` | `Queued \| Running` |
15+
| `tests:${testid}` | `timestamp` | Epoch seconds when test is queued. |
16+
| `tests:${testid}` | `user` | The OAuth user "sub", or undefined for anonymous tests. |
17+
| `tests:${testid}` | `host` | The value of HOSTNAME for the worker running the test. On K8s this will be the pod name. |
1718

1819
Tests that are started by an authenticated user are stored in a set associated with the user.
1920

20-
| Key | Value
21-
| ---------------------------------------- | -------------------------------------------
22-
| `users:${userSub}:tests` | `[testid1, testid2, ...]`
21+
| Key | Value |
22+
| ------------------------ | ------------------------- |
23+
| `users:${userSub}:tests` | `[testid1, testid2, ...]` |
2324

2425
## Pub/Sub Messages
2526

worker/jobs/boptest_run_test/job.py

+19-2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import redis
88
import numpy as np
99
import msgpack
10+
import logging
1011
from boptest.lib.testcase import TestCase
1112

1213
class Job:
@@ -19,6 +20,13 @@ def __init__(self, parameters):
1920
self.abort = False
2021
self.last_message_time = datetime.now()
2122

23+
log_level = os.environ.get("BOPTEST_LOGLEVEL", "INFO")
24+
logging.getLogger().setLevel(log_level)
25+
logging.getLogger('botocore').setLevel(log_level)
26+
logging.getLogger('s3transfer').setLevel(log_level)
27+
logging.getLogger('urllib3').setLevel(log_level)
28+
self.logger = logging.getLogger('worker')
29+
2230
self.redis = redis.Redis(host=os.environ["BOPTEST_REDIS_HOST"])
2331
self.redis_pubsub = self.redis.pubsub()
2432
if self.redis.hexists(self.testKey, "user"):
@@ -91,7 +99,7 @@ def run(self):
9199
def check_idle_time(self):
92100
idle_time = datetime.now() - self.last_message_time
93101
if idle_time.total_seconds() > self.timeout:
94-
print("Testid '%s' is terminating due to inactivity." % self.testid)
102+
self.logger.info("Testid '%s' is terminating due to inactivity." % self.testid)
95103
self.keep_running = False
96104

97105
# Begin methods for message passing between web and worker ###
@@ -149,10 +157,14 @@ def process_messages(self):
149157
method = message_data.get("method")
150158
params = message_data.get("params")
151159

160+
self.logger.info("Request ID: '%s', with method '%s', was received" % (request_id, method))
161+
152162
callback_result = self.call_message_handler(method, params)
153163
packed_result = self.pack({"requestID": request_id, "payload": callback_result})
154164
self.redis.publish(response_channel, packed_result)
155165

166+
self.logger.info("Response for, '%s', was sent" % request_id)
167+
156168
self.last_message_time = datetime.now()
157169
except Job.InvalidRequestError as e:
158170
payload = {"status": 400, "message": "Bad Request", "payload": str(e)}
@@ -161,7 +173,7 @@ def process_messages(self):
161173
except Exception:
162174
# Generic exceptions are an internal error
163175
error_message = str(sys.exc_info()[1])
164-
print(error_message)
176+
self.logger.error(error_message)
165177

166178
# End methods for message passing
167179

@@ -258,6 +270,8 @@ def reset(self, tarinfo):
258270

259271
# cleanup after the simulation is stopped
260272
def cleanup(self):
273+
self.logger.info("Test '%s' is complete" % self.testid)
274+
261275
self.redis.delete(self.testKey)
262276
if self.userTestsKey:
263277
self.redis.srem(self.userTestsKey, self.testid)
@@ -294,3 +308,6 @@ def keys_to_camel_case(self, a_dict):
294308

295309
def init_sim_status(self):
296310
self.redis.hset(self.testKey, "status", "Running")
311+
host = os.getenv("HOSTNAME")
312+
if host:
313+
self.redis.hset(self.testKey, "host", host)

worker/logger.py

+7-5
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@ class Logger:
1111
"""A logger specific for the tasks of the Worker"""
1212

1313
def __init__(self):
14-
logging.basicConfig(level=os.environ.get("BOPTEST_LOGLEVEL", "INFO"))
1514
self.logger = logging.getLogger('worker')
16-
self.formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
15+
self.logger.setLevel(os.environ.get("BOPTEST_LOGLEVEL", "INFO"))
16+
fmt = '%(asctime)s UTC\t%(name)-20s%(levelname)s\t%(message)s'
17+
datefmt = '%m/%d/%Y %I:%M:%S %p'
18+
formatter = logging.Formatter(fmt,datefmt)
1719

18-
self.fh = logging.FileHandler('worker.log')
19-
self.fh.setFormatter(self.formatter)
20-
self.logger.addHandler(self.fh)
20+
stream_handler = logging.StreamHandler()
21+
stream_handler.setFormatter(formatter)
22+
self.logger.addHandler(stream_handler)

0 commit comments

Comments
 (0)