[FEA]: Create a NeMo Service and NeMo Stage #1130
Labels
feature request
New feature or request
sherlock
Issues/PRs related to Sherlock workflows and components
Milestone
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
This feature would allow Morpheus pipelines to integrate with NVIDIA's LLM service, NeMo, by sending inference requests to the service from a stage in the pipeline.
The ability to run LLM models inside of a Morpheus pipeline will allow for pipelines to execute complex NLP tasks with very large models. Often these models would be too large to run inside of a Morpheus pipeline so sending the requests off to an external service fits well with other inference services like Triton.
Describe your ideal solution
This new feature should be built from 2 components:
nemo_llm
library (python) or CURL (C++)Configurable Options
The NeMo Inference stage should include (but not be limited to the following configurable parameters:
nemo_llm
supportstokens_to_generate
,stop
,temperature
, etc.Describe any alternatives you have considered
A test prototype has been created here: https://github.com/mdemoret-nv/Morpheus/tree/mdd_nemo-stage/examples/nemo
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: