Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

Label_Microservice worker should use the new Model classes #85

Merged
merged 1 commit into from
Dec 31, 2019

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Dec 31, 2019

  • Most of the inference logic in the Worker class can be deleted and
    we can instead use the newly created Model classes.

  • Use an in cluster deployment service

  • Log the model and thresholds.

  • cli.py is a simple CLI to send pubsub requests

    • It is ended for development and debugging.
  • Create a utility to parse issue specs of the form {org}/{repo}#number

  • We need to initialize IssueLabelPredictor() inside the pubsub thread;
    otherwise we have threading issues when trying to access the model.

  • The universal model needs to apply appropriate thresholds to its predictions.

  • Worker no longer needs to apply thresholding of predictions because that is
    now handled by the individual model classes.

  • Add a method to worker apply_repo_config that handles repo specific behavior
    specified in the YAML file in the repo

    • Apply aliases to the labels
    • If the config specifies which labels to predict then filter out all other
      labels.
  • When commenting on issues about the predictions, the bot should reate a markdown table with the probabilities for the labels in the comment.

  • Remove test_worker.py; the test is no longer relevant after the refactor.
    There is a TODO in worker.py to create an appropriate unittest for the new
    code.

  • Change how the GitHub App Private Keys are handled.

    • The private key should be mounted as a secret and an environment variable
      should specify the path of the PEM key.

    • Stop storing the PEM key as an environment variable and then in python code writing it to a file.

    • Change the name of the environment variables to add the prefix "GITHUB_"
      to make it more explanatory.

    • Move some of the helper methods for creating GitHubApp out of github_util
      and into the GitHubApp class

Related to: #70 combine universal and repo specific models.


This change is Reviewable

* Most of the inference logic in the Worker class can be deleted and
we can instead use the newly created Model classes.

* Use an in cluster deployment service
* Log the model and thresholds.

* cli.py is a simple CLI to send pubsub requests
  * It is ended for development and debugging.

* Create a utility to parse issue specs of the form {org}/{repo}#number

* We need to initialize IssueLabelPredictor() inside the pubsub thread;
otherwise we have threading issues when trying to access the model.

* The universal model needs to apply appropriate thresholds to its predictions.

* Worker no longer needs to apply thresholding of predictions because that is
  now handled by the individual model classes.

* Add a method to worker apply_repo_config that handles repo specific behavior
specified in the YAML file in the repo

  * Apply aliases to the labels
  * If the config specifies which labels to predict then filter out all other
    labels.

* When commenting on issues about the predictions, the bot should reate a markdown table with the probabilities for the labels in the comment.

* Remove test_worker.py; the test is no longer relevant after the refactor.
  There is a TODO in worker.py to create an appropriate unittest for the new
  code.

* Change how the GitHub App Private Keys are handled.

  * The private key should be mounted as a secret and an environment variable
    should specify the path of the PEM key.

  * Stop storing the PEM key as an environment variable and then in python code writing it to a file.

  * Change the name of the environment variables to add the prefix "GITHUB_"
to make it more explanatory.

  * Move some of the helper methods for creating GitHubApp out of github_util
    and into the GitHubApp class

Related to: kubeflow#70 combine universal and repo specific models.
@jlewi
Copy link
Contributor Author

jlewi commented Dec 31, 2019

/assign @hamelsmu

if not m:
return None, None, None
return m.group(1), m.group(2), int(m.group(3))
Copy link
Member

@hamelsmu hamelsmu Dec 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe return a named tuple? Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry looks like this got merged before I saw this comment.

Copy link
Member

@hamelsmu hamelsmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor question otherwise,

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hamelsmu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hamelsmu
Copy link
Member

/lgtm

Going ahead and stamping with LGTM b/c only had a minor question which is optional

@k8s-ci-robot k8s-ci-robot merged commit 32432bd into kubeflow:master Dec 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants