Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton inference automatic resubmit #1190

Open
yimuchen opened this issue Oct 8, 2024 · 0 comments
Open

Triton inference automatic resubmit #1190

yimuchen opened this issue Oct 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@yimuchen
Copy link
Contributor

yimuchen commented Oct 8, 2024

Right now, the triton_wrapper seems to work fine as long as the server hosting the model scales correctly with the number of active jobs. If the server doesn't keep up, a random chunk would raise a tritonclient.utils.InferenceServerExpection, and the whole evaluation is terminated (Which if very difficult to track down).

We can have this handled either upstream in the trion_wrapper instance, or if we think these server configuration handling should be done on the analyst side, it can be patched into any subclass of triton_wrapper with something like:

from coffea.ml_tools.triton_wrapper import triton_wrapper
from tritonclient.utils import InferenceServerException

class my_model_wrapper(triton_wrapper):
    def prepare_awkward(self, *args, **kwargs):
        pass # Or whatever is needed
        
    def numpy_call(self, output_list, input_dict, ncalls=0):
        """
        Overloading the upstream numpy_call to allow for up to 3 times failure 
        from transient issues with server. (Adding default ncalls will allow this to 
        work seemlessly with the the existing upstream value)
        """
        try:
            return super().numpy_call(output_list, input_dict)
        except InferenceServerException as err:
            print("Caught inference server exception:", err)
            if ncalls > 3:
                print("Not resolved after 3 retries... this is an actual error")
                raise err
            else:
                return self.numpy_call(output_list, input_dict, ncalls + 1)
@yimuchen yimuchen added the enhancement New feature or request label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant