Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to chain RemoteRunnable clients to local llm server (hosted using langserve)? #28647

Open
5 tasks done
jianlins opened this issue Dec 10, 2024 · 1 comment
Open
5 tasks done
Labels
Ɑ: core Related to langchain-core

Comments

@jianlins
Copy link

jianlins commented Dec 10, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

On the server side, I used HuggingFacePipeline to load a local model

from fastapi import FastAPI
# from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_huggingface.llms import HuggingFacePipeline
from langchain_huggingface import ChatHuggingFace
from langserve import add_routes
import torch
import os
cache_dir = "./transforms_cache"
os.environ['TRANSFORMERS_CACHE'] = cache_dir
os.environ['HF_HOME']=cache_dir

transformers.utils.move_cache(new_cache_dir=cache_dir)

app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="Spin up a simple api server using Langchain's Runnable interfaces",
)


model_name = "allenai/Llama-3.1-Tulu-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name,cache_dir=cache_dir)
tulu_model = AutoModelForCausalLM.from_pretrained(model_name,cache_dir=cache_dir,
                                                  torch_dtype=torch.float16, device_map="auto",)

hf_pipeline = pipeline("text-generation", model=tulu_model, tokenizer=tokenizer, max_new_tokens=6400)
hf = HuggingFacePipeline(pipeline=hf_pipeline)
chat=ChatHuggingFace(llm=hf)


app = FastAPI(title="LLM Server", version="1.0")

# Add the LLM to the server
add_routes(app, chat, path="/llm")


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8099)

Client side, use RemoteRunnable to connect, although it can successfully invoke an input string, it failed to be applied in LLMChain

from langserve import RemoteRunnable
from langchain import LLMChain, PromptTemplate
from langchain.chains import SimpleSequentialChain

# Create a RemoteRunnable that points to your deployed model endpoint
remote_llm = RemoteRunnable(url="http://0.0.0.0:8099/llm")

# Define prompt templates for each step of your chain
capital_prompt = PromptTemplate.from_template("What is the capital city of {country}?")
population_prompt = PromptTemplate.from_template("What is the population of {city}?")

# Create two LLMChains:
# 1. The first chain takes a country and returns the capital city.
chain1 = LLMChain(llm=remote_llm, prompt=capital_prompt)

# 2. The second chain takes the city name (returned by chain1) and returns the population.
chain2 = LLMChain(llm=remote_llm, prompt=population_prompt)

# Combine them into a SimpleSequentialChain:
# SimpleSequentialChain by default passes the output of the first chain
# as the input to the second chain.
overall_chain = SimpleSequentialChain(chains=[chain1, chain2], verbose=True)

# Run the combined chain:
result = overall_chain.run("France")

Error Message and Stack Trace (if applicable)

site-packages/langserve/client.py:448, in RemoteRunnable.batch(self, inputs, config, return_exceptions, **kwargs)
439 def batch(
440 self,
441 inputs: List[Input],
(...)
445 **kwargs: Any,
446 ) -> List[Output]:
447 if kwargs:
--> 448 raise NotImplementedError(f"kwargs not implemented yet. Got {kwargs}")
449 return self._batch_with_config(
450 self._batch, inputs, config, return_exceptions=return_exceptions
451 )

NotImplementedError: kwargs not implemented yet. Got {'stop': None}

Description

I try to use langserve to start a server and use RemoteRunnable as clients to communicate with it. This is helpful to try multiple time without worry about client failure, because restart a client is way faster than reload a llm model. Although, I can do simple llm.invoke using RemoteRunnable, but I cannot use any Chain classes, e.g. LLMChain, SimpleSequentialChain, SequentialChain.

System Info

System Information

OS: Linux
OS Version: #1 SMP Thu Jun 6 09:41:19 UTC 2024
Python Version: 3.10.16 | packaged by conda-forge | (main, Dec 5 2024, 14:16:10) [GCC 13.3.0]

Package Information

langchain_core: 0.3.22
langchain: 0.3.10
langchain_community: 0.3.10
langsmith: 0.1.147
langchain_huggingface: 0.1.2
langchain_openai: 0.2.11
langchain_text_splitters: 0.3.2
langgraph_sdk: 0.1.43
langserve: 0.3.0

Other Dependencies

aiohttp: 3.11.10
async-timeout: 4.0.3
dataclasses-json: 0.6.7
fastapi: 0.115.6
httpx: 0.28.1
httpx-sse: 0.4.0
huggingface-hub: 0.26.5
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 1.26.4
openai: 1.57.0
orjson: 3.10.12
packaging: 24.2
pydantic: 2.10.3
pydantic-settings: 2.6.1
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
sentence-transformers: 3.3.1
SQLAlchemy: 2.0.36
sse-starlette: 1.8.2
tenacity: 9.0.0
tiktoken: 0.8.0
tokenizers: 0.21.0
transformers: 4.47.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the Ɑ: core Related to langchain-core label Dec 10, 2024
@keenborder786
Copy link
Contributor

@jianlins You are using Deprecate way of using, the recommended way is to use LCEL. So you can achieve the same result using as follow:

from langserve import RemoteRunnable
from langchain import PromptTemplate


# Create a RemoteRunnable that points to your deployed model endpoint
remote_llm = RemoteRunnable(url="http://0.0.0.0:8099/llm")

# Define prompt templates for each step of your chain
capital_prompt = PromptTemplate.from_template("What is the capital city of {country}?")
population_prompt = PromptTemplate.from_template("What is the population of {city}?")
overall_chain = capital_prompt | remote_llm

# Run the combined chain:
result = overall_chain.invoke("France")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: core Related to langchain-core
Projects
None yet
Development

No branches or pull requests

2 participants