-
Notifications
You must be signed in to change notification settings - Fork 17
Conversation
TensorRT export/inference works, but there needs to be close coordination with the versions of both TensorRT and CUDA that the corresponding Triton container is running. I have the TRT version currently hard-pinned to the version of TRT used by the container on LDG ( You have to run the export with a version of CUDA <11.7 to be compatible with the version used in the container (I was able to get 11.2 to work, but I think 11.6 should as well). This is a bit trickier, since this is essentially controlled by the One solution here would be to build this in at the (FWIW, the way this works right now on LDG is that they keep multiple versions of CUDA available, but link the latest one to |
52dc7f7
to
ff15493
Compare
Had to fix an issue with pinto that was causing torch imports to throw off lower library search paths. This is merged as of ML4GW/pinto#21, so once that container has pushed from the workflow this will be good to retest. |
Marking as ready for review now that tests are passing. Notable changes:
|
# TODO: hardcoding these kwargs for now, but worth | ||
# thinking about a more robust way to handle this | ||
kwargs = {} | ||
if platform == qv.Platform.ONNX: | ||
kwargs["opset_version"] = 13 | ||
|
||
# turn off graph optimization because of this error | ||
# https://github.com/triton-inference-server/server/issues/3418 | ||
bbhnet.config.optimization.graph.level = -1 | ||
elif platform == qv.Platform.TENSORRT: | ||
kwargs["use_fp16"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the **kwargs
of the main function are assumed to be platform specific variables? I'm not sure if typeo
has the ability to parse variables that aren't directly in the function signature into **kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it doesn't right now, since it relies on annotations to infer the type the parser needs to map to. A few markup parsers (yaml, toml) have functionality for applying heuristics that we could maybe lift, but no idea how complicated that might turn out to be. Off the top of my head, the real headaches would be
- reimplementing argparse's logic in a less structured format for iterating through the unused arguments and grouping them into
(arg_name, value)
pairs, accounting for booleans which would just be formatted as--flag-name
with no associated values. - handling arguments which are misspelled, silently giving you their default behavior (if they don't have a default, this will luckily raise a missing argument error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that LD_LIBRARY_PATH
issue was giving me headaches. Ended up adding my base conda /libs
directory to it in my bash_profile
. Glad that its solved. Looks cool excited to see it in action.
Implementing batched inference to increase throughput
inference_rate
kwarg to control sleep between requests