Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for LCEL Runnables #1586

Merged
merged 10 commits into from
Oct 29, 2023
Merged

Support for LCEL Runnables #1586

merged 10 commits into from
Oct 29, 2023

Conversation

joshuasundance-swca
Copy link
Contributor

Summary

This PR addresses #1564 by enhancing the LangChain representation class to support more flexible LangChain pipelines, including LCEL Runnables. I am certainly open to more changes as needed before merging.

Changes

  • Generalize chain parameter to take any Runnable object instead of just QA chains
  • Add chain_config parameter to support RunnableConfig
  • Use .batch() instead of .run() to invoke chains
  • Fix return type in extract_topics
  • Update docstring for LangChain.__init__

Example Usage

Here is an example of using a custom LCEL pipeline with the updated LangChain class (from the updated docstring):

from bertopic.representation import LangChain
from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import ChatAnthropic
from langchain.schema.document import Document
from langchain.schema.runnable import RunnablePassthrough
from langchain_experimental.data_anonymizer.presidio import PresidioReversibleAnonymizer

prompt = ...
llm = ...

# We will construct a special privacy-preserving chain using Microsoft Presidio

pii_handler = PresidioReversibleAnonymizer(analyzed_fields=["PERSON"])

chain = (
        {
            "input_documents": (
                lambda inp: [
                    Document(
                        page_content=pii_handler.anonymize(
                            d.page_content,
                            language="en",
                        ),
                    )
                    for d in inp["input_documents"]
                ]
            ),
            "question": RunnablePassthrough(),
        }
        | load_qa_chain(representation_llm, chain_type="stuff")
        | (lambda output: {"output_text": pii_handler.deanonymize(output["output_text"])})
)

representation_model = LangChain(chain, prompt=representation_prompt)

Copy link
Owner

@MaartenGr MaartenGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible PR! Thanks to the extensive documentation, description, changes, etc. this was a pleasure collaborating on. I only have two, very minor, suggestions but other than that it looks great.

bertopic/representation/_langchain.py Outdated Show resolved Hide resolved
bertopic/representation/_langchain.py Outdated Show resolved Hide resolved
bertopic/representation/_langchain.py Show resolved Hide resolved
@MaartenGr
Copy link
Owner

Awesome, thanks for the great collaboration and the extensive work! I highly appreciate such thorough PRs/Issues and I think users will love to have this feature. Also, the example you gave is just great!

@MaartenGr MaartenGr merged commit b57a8db into MaartenGr:master Oct 29, 2023
2 checks passed
@joshuasundance-swca joshuasundance-swca deleted the batch branch October 30, 2023 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants