Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(traces): add reranking span kind for document reranking in llama index #1588
feat(traces): add reranking span kind for document reranking in llama index #1588
Changes from 5 commits
04e6ab2
245441c
5019bf6
3d2885d
86c82a3
1bb7467
71e34b0
8df4c04
b79bbdf
58cf84c
58294db
374fa32
c50c3d3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a
titleExtra
prop on card where you can place aCounter
component. https://5f9739a76e154c00220dd4b9-zeknbennzf.chromatic.com/?path=/story/counter--galleryThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might re-color these just so that there's a visual hierarchy of color (e.g. that re-ranked documents take on a different tint) - this way as you are clicking around you can clearly see the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-using the
DocumentItem
component is good but I think just showing the newscore
label might be a tad confusing? Or just usingscore
as an abstract is intended here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In one case it's often spacial distance where as when running through a reranker it is a relevance rank. Just thinking that from a user's perspective displaying
score: XX
alongside both we lose a bit of an opportunity to explain thescore
in this context a bit better -score
being pretty generic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even though
score
is generic, it is still accurate. On the input side of the reranker,score
may or may not exist, and even if it does exist, it's not considered by the reranker. But if the "input"score
does exist it was generated by a preprocessor for a separate purpose. The general mental picture here is that there could be millions of documents in a corpus, and only a relatively small set are chosen to be reranked, and that selection process can have ascore
of its own based on the query in question. Even though thatscore
is not meaningful to the reranker, it is still an informative attribute of the input document, because it relays the reason for how the document became a candidate in the first place (especially when the preprocessor is missing in the trace). On the other hand, we can't really get more specific that thescore
verbiage because we don't have more information. On balance, although it may seem confusing at first, a user should have enough context to reason their way through it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wasn't disputing the way we capture the score - was just thinking of ways to avoid the mental "eason their way through it." a bit. But I don't have an immediate good prefix for the reranker score so let's keep it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - rely on titleExtra and counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it and it looks like this.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question on this parameter: https://docs.cohere.com/docs/reranking
I'm guessing this is the same as TOP_N? If you feed say 5 documents but pass top_k of 3, does it only rank 3? Just trying to understand why this is a parameter to the rerank model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the same as TOP_N. (The letter is
K
in literature becauseN
is usually the total number of docs.) The caller of the reranker usually just wants a relatively small number of docs out of a initial set of tens or hundreds. It's certainly optional because it can just rank each document, but in general, a reduction in number is expected for each stage of the retrieval process.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still confused though - if I pass say 5 documents with
top_k
of 3 - does it rank 5 and trim the last two?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it retains up to
K
in the output, so in the case oftop 3
, two docs of the lowest scores have to be dropped.Top K
is applied after ranking all 5 docs.