Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[trace][semantic] attributes for re-ranking #1153

Closed
4 tasks
Tracked by #1000 ...
mikeldking opened this issue Aug 18, 2023 · 7 comments · Fixed by #1588
Closed
4 tasks
Tracked by #1000 ...

[trace][semantic] attributes for re-ranking #1153

mikeldking opened this issue Aug 18, 2023 · 7 comments · Fixed by #1588
Assignees
Labels

Comments

@mikeldking
Copy link
Contributor

mikeldking commented Aug 18, 2023

Use-cases:
Sometimes the re
Use cohere re-ranker
Use GPT based prompt re-ranker
General cross-encoder based re-ranker

  • model or strategy used to re-rank
  • final re-ordering of documents
  • scores from the model
  • re-ranked documents

This will allow us to compute NCDG of retrieval

@axiomofjoy
Copy link
Contributor

LlamaIndex supports re-ranking via:

  • generative LLMs
  • Cohere API
  • sbert

These re-rankers are implemented as node post-processors and do not have specialized callback hooks. All post-processors are currently run inside of this retrieval step, so the payload to the on_event_end hook with RETRIEVE callback event type includes the retrieved documents post-reranking. The callback system does not currently have access to the retrieved documents pre-reranking, nor does it have access to any of the re-ranking data or metadata (e.g., model name or scores).

@axiomofjoy
Copy link
Contributor

It's unclear to me whether re-ranking is worthy of its own span, and if so, what the span kind would be, or if re-ranking data should be attached to the retrieval span via semantic conventions.

@axiomofjoy
Copy link
Contributor

Script to run the LlamaIndex Cohere re-ranker with our callback handler:

import os

from phoenix.experimental.callbacks.llama_index_trace_callback_handler import (
    OpenInferenceTraceCallbackHandler,
)

from llama_index import ServiceContext, SimpleDirectoryReader, VectorStoreIndex
from llama_index.callbacks import CallbackManager
from llama_index.indices.postprocessor.cohere_rerank import CohereRerank
from llama_index.response.pprint_utils import pprint_response

documents = SimpleDirectoryReader(
    "/Users/xandersong/llama_index/docs/examples/data/paul_graham"
).load_data()

index = VectorStoreIndex.from_documents(documents=documents)

# api_key = os.environ["COHERE_API_KEY"]
callback_handler = OpenInferenceTraceCallbackHandler()
service_context = ServiceContext.from_defaults(
    callback_manager=CallbackManager(handlers=[callback_handler])
)
cohere_rerank = CohereRerank(
    # api_key=api_key,
    top_n=2
)
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[cohere_rerank],
    service_context=service_context,
)
callback_handler = OpenInferenceTraceCallbackHandler()
response = query_engine.query(
    "What did Sam Altman do in this essay?",
)
pprint_response(response)

print(callback_handler._tracer.span_buffer)

@axiomofjoy
Copy link
Contributor

LangChain implements Cohere re-ranking. Here's a script:

from langchain.llms import OpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank
from langchain.chains import RetrievalQA

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.vectorstores import Chroma
from phoenix.experimental.callbacks.langchain_tracer import OpenInferenceTracer


documents = TextLoader(
    "/Users/xandersong/langchain/docs/extras/modules/state_of_the_union.txt"
).load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
base_retriever = Chroma.from_documents(texts, OpenAIEmbeddings()).as_retriever(
    search_kwargs={"k": 20}
)
llm = OpenAI(temperature=0)
compressor = CohereRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=base_retriever
)

tracer = OpenInferenceTracer()
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
query = "What did the president say about Ketanji Brown Jackson"
output = chain({"query": query}, callbacks=[tracer])
print(output)

@axiomofjoy
Copy link
Contributor

Instead of implementing as a node post-processor, the re-ranking happens by wrapping the base retriever (e.g., the one using a simple cosine similarity search) in a second ContextualCompressionRetriever that has access to the Cohere re-ranker. Both the document pre- and post-re-ranking are available in the callback system. As far as I can tell, the re-ranking relevance scores from the Cohere endpoint are not available.

@axiomofjoy
Copy link
Contributor

axiomofjoy commented Aug 23, 2023

LangChain provides an opinion here that re-rankers are just retrievers. There is a parent retriever span (the re-ranker span) that has a child retriever span (the cosine similarity span).

@mikeldking
Copy link
Contributor Author

Let's schedule this one being focused on llama_index and figure out if we can surface up some information that is ultimately useful towards calculating things like NCDG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants