Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error configuring NeMo Guardrails with a TensorRT LLM serving on TRT-LLM server #657

Open
w8jie opened this issue Jul 31, 2024 · 0 comments

Comments

@w8jie
Copy link

w8jie commented Jul 31, 2024

Has anyone successfully integrated TensorRT LLM with NeMo Guardrails? There is a lack of documentation on utilizing TensorRT LLM for Nemo Guardrails so I am hoping for some guidance here.

I am getting the following error when trying to execute self check input:
Error while execution self_check_input: [StatusCode.UNAVAILABLE] DNS resolution failed for http://triton-models:8000/v2/models/ensemble/generate: C-ares status is not ARES_SUCCESS qtype=AAAA name=http://triton-models:8000/v2/models/ensemble/generate is_balancer=0: Could not contact DNS servers

My config.yml:

models:
 - type: main
   engine: trt_llm
   parameters:
    server_url: http://triton-models:8000/v2/models/ensemble/generate

# These instructions configure the bot to answer questions about the employee handbook and the company's policies.
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called X Virtual Assistant.
      The bot is talkative and provides lots of specific details from its context.
      If the bot does not know the answer to a question, it truthfully says it does not know.

rails:
  # Input rails are invoked when a new message from the user is received.
  input:
    flows:
      - self check input

  # Output rails are triggered after a bot message has been generated.
  output:
    flows:
      - self check output

langchain integration:

# load NeMo Guardrails
config = RailsConfig.from_path("./guardrails_config")
guardrails = RunnableRails(config, passthrough=False)

class ChatChain(LLMChain):
    def __init__(self, llm, examples):
        super().__init__(llm)
        prompt = chat_template.format_chat_prompt(examples)
        custom_chain = (
            prompt
            | llm
            | StrOutputParser()
        )

        mem_chain = RunnableWithMessageHistory(
            custom_chain,
            get_message_history,
            input_messages_key="input",
            history_messages_key="history",
            output_messages_key="output"
        )

        self.mem_chain_with_guardrails = guardrails | mem_chain
    
    def get_chain(self):
        return self.mem_chain_with_guardrails

Pip version:
nemoguardrails==0.9.0
tritonclient[all]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant