Use json logging to log information about the issue being processed when exceptions occur #77

jlewi · 2019-12-25T03:01:09Z

Use json logging to log information about the issue being processed when exceptions occur

related to: #70 Combine repo specific and universal model

This change is

jlewi · 2019-12-25T03:12:47Z

Generating inference requests with the universal model is currently failing with the stack trace shown below.

My suspicion is that there is a bug in the code for the universal graph model and I'm not properly managing the TF graph.

I wonder if the graph is getting overwritten when we load the repo specific models?

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2449, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/py/label_microservice/app.py", line 83, in predict
    predictions = model.predict_issue_labels(title, text)
  File "/py/label_microservice/universal_kind_label_model.py", line 76, in predict_issue_labels
    probs = self.model.predict(x=[vec_body, vec_title]).tolist()[0]
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 908, in predict
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 723, in predict
    callbacks=callbacks)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
    batch_outs = f(ins_batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_5/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/dense_5/kernel/N10tensorflow3VarE does not exist.
	 [[{{node dense_5/MatMul/ReadVariableOp}}]]

py/label_microservice/models.py

hamelsmu · 2019-12-25T04:01:12Z

@jlewi regarding the stack trace, I vaguely remember encountering this the first time around with Issue Label Bot. Some things that helped me:

Storing the graph in the application state, and initialize the default graph example
When doing inference doing this within the default graph context example

I am not 100% sure this will resolve this problem, but its worth a try?

hamelsmu · 2019-12-25T04:18:33Z

py/label_microservice/models.py

@@ -0,0 +1,24 @@
+"""The models packages defines wrappers around different models."""
+


Consider using abstract classes to define python interfaces. Not required, but might help catch errors later

hamelsmu · 2019-12-25T04:41:32Z

py/label_microservice/combined_model.py

+    results = {}
+    results.update(left)
+
+    for label, probability in right.items():


Consider using python groupby - example:

from itertools import groupby dict1 = {'bug': .9, 'feature': .74, 'comment': .12} dict2 = {'jupyter': .2, 'feature': .8, 'docs': .45} combined_data = list(dict1.items()) + list(dict2.items()) combined_predictions = dict([max(v) for k,v in groupby(combined_data, key=lambda x: x[0])]) assert combined_predictions == {'bug': 0.9, 'feature': 0.8, 'comment': 0.12, 'docs': 0.45, 'jupyter': 0.2}

Thanks. So I initially tried it out; but then I realized it looks like we need to sort the items before calling groupby. So it seems like it would be less efficient because now we have to do a sort
as opposed to just going through both lists which is linear in time.

jlewi · 2019-12-25T22:54:53Z

It looks like the problem was that the flask server was using threads and that causes problems; setting thread=False appears to solve this problem.

jlewi · 2019-12-26T18:11:51Z

Now with the embedding service I'm getting an exception parsing the data

Exception is:
[issue-embedding-server-854c887f4c-9r42j app] ERROR|2019-12-26T17:52:40|Exception occurred in process_dict signal only works in main thread|/flask_app/inference.py|121|

Looks like this might be due to enabling debugging in the flask server.
https://stackoverflow.com/questions/53522052/flask-app-valueerror-signal-only-works-in-main-thread

The existing code had disabled debug mode and had a comment about the code not working in debug mode.

I had enabled debug mode to try to get the advantage of using file sync with skaffolding. To do that we need to retrigger app reloading.

jlewi · 2019-12-26T18:27:14Z

I think we need to use the older version of textacy 0.7.1 it looks like that one defines RE_EMAIl but the newer one 0.9.1 does not.

hamelsmu · 2019-12-26T18:44:53Z

@jlewi question about this PR - I understand that you are combining the predictions of both models. I thought that the global model was obsolete b/c of Issue Templates? Curious as to where you are thinking of taking this .... (or perhaps I'll wait until this PR is out of draft). 🙇

On a side note, incase you are interested if you are trying compare the probabilities from different models, you usually want to calibrate the probabilities first so they can be compared (even if from the same class of model). Resources :1 -explanation of calibration and 2 - isotonic regression, can use this to calibrate, just mentioning this , its not absolutely critical

jlewi · 2019-12-27T00:20:31Z

@hamelsmu even with the issue templates we still get a lot of issues which aren't being labeled e.g.

kubeflow/kubeflow#4601
kubeflow/kubeflow#4602

Looks like it might be because the person filing the bug might be messing with the template.

So the primary goals of this PR is to reenable the universal model for issue kind for repositories where we have repo specific models enabled

The larger goal is we'd like to start using ML to auto predict

Issue kind
Issue area

This PR will hopefully make it easier for us to do things like tweak the threshold labels for the area to do a better job predicting area labels.

I'd expect we should be able to auto-assign area labels for a large number of issues.

So a next step after this PR (#79) is to fix the logging so we can easily track how often we add area and kind labels,

jlewi · 2019-12-27T01:01:48Z

@hamelsmu btw I'll split this up into smaller PRs that are easier to review.

jlewi · 2019-12-27T02:47:20Z

Fired off 3 initial PRs

#80 embedding_service changes
#81 autorestarter for use with skaffold
#82 the various classes to aid in combining models

k8s-ci-robot · 2020-01-18T00:53:24Z

The following users are mentioned in OWNERS file(s) but are not members of the kubeflow org.

Once all users have been added as members of the org, you can trigger verification by writing /verify-owners in a comment.

rileyjbauer
- kubeflow_clusters/issue-label-bot/kustomize/metadata/OWNERS

jlewi · 2020-01-18T00:57:07Z

The bulk of the changes in this PR have been split off and merged. In separate PRs.

I've rebased the PR and there are just a couple cosmetic changes left.

* Attach extra metadata information in the log messages. * Related to kubeflow#70 - use ensemble models

jlewi · 2020-01-18T01:10:18Z

@hamelsmu this is ready for review. Most of the changes were already submitted in other PRs; only thing that remains is some logging.

hamelsmu · 2020-01-18T01:27:33Z

/lgtm
/approve

hamelsmu · 2020-01-18T06:20:48Z

/lgtm
/approve

hamelsmu · 2020-01-18T06:30:25Z

@jlewi looks like you have to fix owners file

hamelsmu · 2020-01-18T06:31:03Z

/verify-owners

hamelsmu · 2020-01-18T06:31:36Z

/lgtm

jlewi · 2020-01-18T07:11:58Z

Thanks
/approve

k8s-ci-robot · 2020-01-18T07:12:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hamelsmu, jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hamelsmu,jlewi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jlewi · 2020-01-19T01:20:45Z

/hold cancel

k8s-ci-robot added the do-not-merge/work-in-progress label Dec 25, 2019

k8s-ci-robot requested review from hamelsmu and inc0 December 25, 2019 03:01

k8s-ci-robot added the size/XXL label Dec 25, 2019

googlebot added the cla: yes label Dec 25, 2019