You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the triton_wrapper seems to work fine as long as the server hosting the model scales correctly with the number of active jobs. If the server doesn't keep up, a random chunk would raise a tritonclient.utils.InferenceServerExpection, and the whole evaluation is terminated (Which if very difficult to track down).
We can have this handled either upstream in the trion_wrapper instance, or if we think these server configuration handling should be done on the analyst side, it can be patched into any subclass of triton_wrapper with something like:
fromcoffea.ml_tools.triton_wrapperimporttriton_wrapperfromtritonclient.utilsimportInferenceServerExceptionclassmy_model_wrapper(triton_wrapper):
defprepare_awkward(self, *args, **kwargs):
pass# Or whatever is neededdefnumpy_call(self, output_list, input_dict, ncalls=0):
""" Overloading the upstream numpy_call to allow for up to 3 times failure from transient issues with server. (Adding default ncalls will allow this to work seemlessly with the the existing upstream value) """try:
returnsuper().numpy_call(output_list, input_dict)
exceptInferenceServerExceptionaserr:
print("Caught inference server exception:", err)
ifncalls>3:
print("Not resolved after 3 retries... this is an actual error")
raiseerrelse:
returnself.numpy_call(output_list, input_dict, ncalls+1)
The text was updated successfully, but these errors were encountered:
Right now, the
triton_wrapper
seems to work fine as long as the server hosting the model scales correctly with the number of active jobs. If the server doesn't keep up, a random chunk would raise atritonclient.utils.InferenceServerExpection
, and the whole evaluation is terminated (Which if very difficult to track down).We can have this handled either upstream in the
trion_wrapper
instance, or if we think these server configuration handling should be done on the analyst side, it can be patched into any subclass of triton_wrapper with something like:The text was updated successfully, but these errors were encountered: