Skip to content

[Question] Understanding terminology reference, embedding #600

@DanielRoeder1

Description

@DanielRoeder1

I am getting a bit mixed up with the terminology (reference, embedding, query) and want to ensure that I pass the correct parameters.

I got a Bi Encoder architecture trained using CrossBatchMemory & NTXent loss. Each training sample contains one anchor document and three subsamples (positives). The intended inference task will be that the model retrieves an anchor document given a subsample (positive) like input.

The documentation states that the embedding -> anchor, ref_emb-> positive regarding the inputs for loss functions.

indices_tuple = self.create_indices_tuple(
embeddings,
labels,
E_mem,
L_mem,
indices_tuple,
do_remove_self_comparisons,
)
loss = self.loss(embeddings, labels, indices_tuple, E_mem, L_mem)
return loss

Now when using cross-batch memory the samples in the buffer (E_mem) are passed as ref_emb thus as positives/negatives to the loss function.

This indicates that I should ensure the enqueue mask feeds only the positive samples (i.e. not buffer my anchors) into the buffer.

And finally, given my inference task, I would pass the subsamples (i.e. positives during training) as queries to the AccuracyCalculator and the documents as the reference for evaluating the accuracy?

Let me know if my rational is correct & thanks for any feddback

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionA general question about the library

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions