-
Notifications
You must be signed in to change notification settings - Fork 21
Use json logging to log information about the issue being processed when exceptions occur #77
Conversation
Generating inference requests with the universal model is currently failing with the stack trace shown below. My suspicion is that there is a bug in the code for the universal graph model and I'm not properly managing the TF graph. I wonder if the graph is getting overwritten when we load the repo specific models?
|
@jlewi regarding the stack trace, I vaguely remember encountering this the first time around with Issue Label Bot. Some things that helped me:
I am not 100% sure this will resolve this problem, but its worth a try? |
py/label_microservice/models.py
Outdated
@@ -0,0 +1,24 @@ | |||
"""The models packages defines wrappers around different models.""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using abstract classes to define python interfaces. Not required, but might help catch errors later
results = {} | ||
results.update(left) | ||
|
||
for label, probability in right.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using python groupby
- example:
from itertools import groupby
dict1 = {'bug': .9,
'feature': .74,
'comment': .12}
dict2 = {'jupyter': .2,
'feature': .8,
'docs': .45}
combined_data = list(dict1.items()) + list(dict2.items())
combined_predictions = dict([max(v) for k,v in groupby(combined_data, key=lambda x: x[0])])
assert combined_predictions == {'bug': 0.9,
'feature': 0.8,
'comment': 0.12,
'docs': 0.45,
'jupyter': 0.2}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. So I initially tried it out; but then I realized it looks like we need to sort the items before calling groupby. So it seems like it would be less efficient because now we have to do a sort
as opposed to just going through both lists which is linear in time.
It looks like the problem was that the flask server was using threads and that causes problems; setting thread=False appears to solve this problem. |
Now with the embedding service I'm getting an exception parsing the data
Looks like this might be due to enabling debugging in the flask server. The existing code had disabled debug mode and had a comment about the code not working in debug mode. I had enabled debug mode to try to get the advantage of using file sync with skaffolding. To do that we need to retrigger app reloading. |
I think we need to use the older version of textacy 0.7.1 it looks like that one defines RE_EMAIl but the newer one 0.9.1 does not. |
@jlewi question about this PR - I understand that you are combining the predictions of both models. I thought that the global model was obsolete b/c of Issue Templates? Curious as to where you are thinking of taking this .... (or perhaps I'll wait until this PR is out of draft). 🙇 On a side note, incase you are interested if you are trying compare the probabilities from different models, you usually want to calibrate the probabilities first so they can be compared (even if from the same class of model). Resources :1 -explanation of calibration and 2 - isotonic regression, can use this to calibrate, just mentioning this , its not absolutely critical |
@hamelsmu even with the issue templates we still get a lot of issues which aren't being labeled e.g. kubeflow/kubeflow#4601 Looks like it might be because the person filing the bug might be messing with the template. So the primary goals of this PR is to reenable the universal model for issue kind for repositories where we have repo specific models enabled The larger goal is we'd like to start using ML to auto predict
This PR will hopefully make it easier for us to do things like tweak the threshold labels for the area to do a better job predicting area labels. I'd expect we should be able to auto-assign area labels for a large number of issues. So a next step after this PR (#79) is to fix the logging so we can easily track how often we add area and kind labels, |
@hamelsmu btw I'll split this up into smaller PRs that are easier to review. |
f2621a8
to
d3ce0b0
Compare
bccdd9c
to
d6a9212
Compare
The following users are mentioned in OWNERS file(s) but are not members of the kubeflow org. Once all users have been added as members of the org, you can trigger verification by writing
|
The bulk of the changes in this PR have been split off and merged. In separate PRs. I've rebased the PR and there are just a couple cosmetic changes left. |
* Attach extra metadata information in the log messages. * Related to kubeflow#70 - use ensemble models
@hamelsmu this is ready for review. Most of the changes were already submitted in other PRs; only thing that remains is some logging. |
/lgtm |
/lgtm |
@jlewi looks like you have to fix owners file |
/verify-owners |
/lgtm |
Thanks |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hamelsmu, jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
Use json logging to log information about the issue being processed when exceptions occur
related to: #70 Combine repo specific and universal model
This change is