using Reranker in a multithreaded process issues Already borrowed Runtime Exeption #42

sam-bercovici · 2024-10-28T11:44:55Z

I am using colbert.

I suggest you allow to pass tokenizer_kwargs and model_kewargs to the Reranker factory class which will pass it forward.

follows an example on how to modify the ColBERTRanker ini

I marked the modification with ## change

    def __init__(
        self,
        model_name: str,
        batch_size: int = 32,
        dtype: Optional[Union[str, torch.dtype]] = None,
        device: Optional[Union[str, torch.device]] = None,
        verbose: int = 1,
        query_token: str = "[unused0]",
        document_token: str = "[unused1]",
        **kwargs, ## change
    ):
        self.verbose = verbose
        self.device = get_device(device, self.verbose)
        self.dtype = get_dtype(dtype, self.device, self.verbose)
        self.batch_size = batch_size
        vprint(
            f"Loading model {model_name}, this might take a while...",
            self.verbose,
        )
        tokenizer_kwargs = kwargs.get("tokenizer_kwargs", {}) ## change
        self.tokenizer = AutoTokenizer.from_pretrained(model_name, **tokenizer_kwargs) ## change
        model_kwargs = kwargs.get("model_kwargs", {}) ## change
        self.model = (
            ColBERTModel.from_pretrained(model_name, **model_kwargs) ## change
            .to(self.device)
            .to(self.dtype)
        )
        self.model.eval()
        self.query_max_length = 32  # Lower bound
        self.doc_max_length = (
            self.model.config.max_position_embeddings - 2
        )  # Upper bound
        self.query_token_id: int = self.tokenizer.convert_tokens_to_ids(query_token)  # type: ignore
        self.document_token_id: int = self.tokenizer.convert_tokens_to_ids(
            document_token
        )  # type: ignore
        self.normalize = True

bclavie · 2024-11-04T05:50:35Z

Thanks for flagging! Would you be willing to submit your proposed changes as a PR? I'm happy with this logic being added to handle various kwargs situations!

sam-bercovici · 2024-11-04T16:13:24Z

Thanks for flagging! Would you be willing to submit your proposed changes as a PR? I'm happy with this logic being added to handle various kwargs situations!

Sure.
I will try to find a couple of hours to do so in the next week or so.

sam-bercovici · 2024-11-09T20:16:30Z

see #44

bclavie · 2024-11-12T08:08:04Z

Merged, thank you! Will ship with 0.0.6 in ~30mn

sam-bercovici mentioned this issue Nov 9, 2024

pass kwargs allowing model_kwargs and tokenizer_kwargs to be passed. … #44

Merged

bclavie closed this as completed Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using Reranker in a multithreaded process issues Already borrowed Runtime Exeption #42

using Reranker in a multithreaded process issues Already borrowed Runtime Exeption #42

sam-bercovici commented Oct 28, 2024

bclavie commented Nov 4, 2024

sam-bercovici commented Nov 4, 2024

sam-bercovici commented Nov 9, 2024

bclavie commented Nov 12, 2024

using Reranker in a multithreaded process issues Already borrowed Runtime Exeption #42

using Reranker in a multithreaded process issues Already borrowed Runtime Exeption #42

Comments

sam-bercovici commented Oct 28, 2024

bclavie commented Nov 4, 2024

sam-bercovici commented Nov 4, 2024

sam-bercovici commented Nov 9, 2024

bclavie commented Nov 12, 2024