Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Use StackDriver + BigQuery to log predictions #79

Closed
jlewi opened this issue Dec 26, 2019 · 6 comments
Closed

Use StackDriver + BigQuery to log predictions #79

jlewi opened this issue Dec 26, 2019 · 6 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Dec 26, 2019

We should use bigquery and stackdriver to log predictions.

This should work as follows

  1. We should emit json log entries containing the predictions
  2. We should store logs in stackdriver
  3. We should setup the bigquery sink for stackdriver
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label kind/feature to this issue, with a confidence of 0.98. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 29, 2019

I setup a biguery sink in the project issue-label-bot-dev to begin experimenting with.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

There's some information here about doing structured logging with the logging module.
https://docs.python.org/3/howto/logging-cookbook.html#implementing-structured-logging

It looks like this relies on the caller of logging.info() to pass a string representing the json dictionary.

I think we want to do something like the json formatter to automatically format all entries as json
https://pypi.org/project/JSON-log-formatter/

@jlewi
Copy link
Contributor Author

jlewi commented Jan 3, 2020

I created a new sync. It looks like before when I created a sync I used a filter expression that wouldn't include the new prod deployment.

I created a new sync with the filter

resource.type="k8s_container" resource.labels.cluster_name="issue-label-bot" resource.labels.container_name="app"

jlewi pushed a commit to jlewi/code-intelligence that referenced this issue Jan 4, 2020
* worker.py should format logs as json entries. This will make it easier
  to query the data in BigQuery and stackdriver to measure performance.

  * Related to kubeflow#79

* To deal with workload identity flakiness (kubeflow#88) test that we can get
  application default credentials on startup and if not exit.

* As a hack to deal with multi-threading issues with Keras models (kubeflow#89)
  have the predict function load a new model on each call

  * It looks like the way pubsub works there is actually a thread pool
    so predict calls won't be handled in the same thread even though
    we throttle it to handle one item at a time.
k8s-ci-robot pushed a commit that referenced this issue Jan 4, 2020
* worker.py should format logs as json entries. This will make it easier
  to query the data in BigQuery and stackdriver to measure performance.

  * Related to #79

* To deal with workload identity flakiness (#88) test that we can get
  application default credentials on startup and if not exit.

* As a hack to deal with multi-threading issues with Keras models (#89)
  have the predict function load a new model on each call

  * It looks like the way pubsub works there is actually a thread pool
    so predict calls won't be handled in the same thread even though
    we throttle it to handle one item at a time.
@jlewi
Copy link
Contributor Author

jlewi commented Jan 19, 2020

Logs are now in stackdriver. Here's a sample query

SELECT jsonPayload  FROM `issue-label-bot-dev.issue_label_bot_logs_dev.stderr_20200117` where jsonPayload.repo_owner="kubeflow" LIMIT 1000

@jlewi jlewi closed this as completed Jan 19, 2020
@jlewi
Copy link
Contributor Author

jlewi commented Jan 19, 2020

it looks like logs are streamed in nearly real time to BigQuery. I observed log entries showing up almost immediately. So it looks as though the sync is much more frequent then once a day.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant