-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Metrics to base class #21437
Comments
This probably comes after #22970 |
Actually, I think there could be value in adding
and have that update the metrics count. @BjornPrime as you free up would you mind working on this? We probably can't totally close out the issue, but we can make a dent in it. cc/ @yeandy |
.take-issue |
Several questions around the error handling:
|
I think we should borrow from best practice patterns in beam I/O's. And return errors along with the error message in the Prediction object. Agree a flag would be nice user experience: Config 1- Fail if any error If Config 1 ( false ) |
+1, I think this is a good experience. It's worth noting that we don't force ModelHandlers to always return a PredictionResult (maybe we should?), but we can at least do this for our handlers. This logic will probably need to live in the modelHandler anyways.
I'm probably -1 on this, I'd prefer to avoid a proliferation of options and to be a little more opinionated here. Failing is generally a bad experience IMO; in streaming pipelines this will potentially cause the pipeline to get stuck, in batch we fail the whole job. This is also a pretty trivial map operation in the next step if that's really what you want to do.
Retries are more interesting to me, I guess my question there is how often do models actually fail inference in a retryable way? I would anticipate most retries to be non-retryable, and am inclined to not do this flag either.
IMO handling this at the batch level is fine, anything more complex is probably more trouble than its worth. |
I'd say most inference failures are due to mismatch in data shapes or data types. Retrying wouldn't help anyway since the user would need to go back and fix their pre-processing logic.
+1 |
Right, that's what I was thinking as well. If we get into remote inference we may want retries for network failures, but at a local level I don't think it makes much sense. ModelHandlers can also make that determination for themselves |
Right, that's what I was thinking as well. If we get into remote inference we may want retries for network failures, but at a local level I don't think it makes much sense. ModelHandlers can also make that determination for themselves Yes future remote use cases are why I think retry is useful, as there can be many transient errors, from network to overloaded end points. As we are adding config now, perhaps we can have retry but set the default for 1 |
My vote would be to defer it. I don't think it gets us anything right now, and we may decide to just bake in retries on network/server errors no matter what in the remote inference case (that's what I'd vote we do). If we do it now, we're stuck with that option regardless of whether it ends up being useful. |
SG, lets deffer. |
Do we want the _inference_counter to increment even if the inference fails? |
My take is to not update |
This may be beyond the scope of my assignment, but looking at the issue holistically, I think we're going to want a more generic |
Not necessarily out of scope to pose a modification upon current implementation. Can you clarify w/ an example of having a more generic update? |
I'm just imagining a situation where we want to update one metric and not all of the others. For example, we could have a method |
I like this proposal, as it makes it more flexible.
To be clear, there are two types: So we should also have something like
That's the right interpretation. See my comment #21437 (comment). I think the only thing we would want to update is |
So, just to make sure I'm understanding, one failed inference doesn't interrupt the rest of the batch from completing? |
To my knowledge, particularly for scikit-learn and pytorch, if the numpy array or torch tensor that we pass to the model contains even one invalid/malformed instance, then predicting for the entire batch will fail. But we may need to do some testing, and more digging around to confirm, especially for TensorFlow. |
Add a metric that will track prediction failures.
num_failed_inferences
``
Add metrics that track data on side loaded models
{}num_model_loads{
},{}num_model_versions{
}, andcurr_model_version
``
curr_model_version is less a metric and more of a static value. Part of this FR will be to figure out how that fits.
Imported from Jira BEAM-14043. Original Jira may contain additional context.
Reported by: Ryan.Thompson.
Subtask of issue #21435
The text was updated successfully, but these errors were encountered: