[Question] Understanding terminology reference, embedding

I am getting a bit mixed up with the terminology (reference, embedding, query)  and want to ensure that I pass the correct parameters.

I got a Bi Encoder architecture trained using CrossBatchMemory & NTXent loss. Each training sample contains one anchor document and three subsamples (positives). The intended inference task will be that the model retrieves an anchor document given a subsample (positive) like input.

The documentation states that the embedding -> anchor, ref_emb-> positive regarding the inputs for loss functions. 

https://github.com/KevinMusgrave/pytorch-metric-learning/blob/691a6354be130547a8e26e170b86cf65c36cd791/src/pytorch_metric_learning/losses/cross_batch_memory.py#L83-L92

Now when using cross-batch memory the samples in the buffer (E_mem) are passed as ref_emb thus as positives/negatives to the loss function. 

This indicates that I should ensure the enqueue mask feeds only the positive samples (i.e. not buffer my anchors) into the buffer.

And finally, given my inference task, I would pass the subsamples (i.e. positives during training) as queries to the AccuracyCalculator and the documents as the reference for evaluating the accuracy?

Let me know if my rational is correct & thanks for any feddback



	indices_tuple = self.create_indices_tuple(
	embeddings,
	labels,
	E_mem,
	L_mem,
	indices_tuple,
	do_remove_self_comparisons,
	)
	loss = self.loss(embeddings, labels, indices_tuple, E_mem, L_mem)
	return loss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Understanding terminology reference, embedding #600

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Question] Understanding terminology reference, embedding #600

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions