Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy integration has no .pipe() method, hence will fallback to individual .call() #15

Closed
davidberenstein1957 opened this issue Jun 27, 2023 · 1 comment · Fixed by #16
Labels
enhancement New feature or request

Comments

@davidberenstein1957
Copy link
Contributor

davidberenstein1957 commented Jun 27, 2023

Not sure what works better during inference (individual sentences or longer segments in larger batches, but maybe something like this could work.

    def pipe(self, stream, batch_size=128, include_sent=None):
        """
        predict the class for a spacy Doc stream

        Args:
            stream (Doc): a spacy doc

        Returns:
            Doc: spacy doc with spanmarker entities
        """
        if isinstance(stream, str):
            stream = [stream]

        if not isinstance(stream, types.GeneratorType):
            stream = self.nlp.pipe(stream, batch_size=batch_size)

        for docs in util.minibatch(stream, size=batch_size):
            batch_results = self.model.predict(docs)

            for doc, prediction in zip(docs, batch_results):
                yield self.post_process_batch(doc, prediction)
@tomaarsen tomaarsen added the enhancement New feature or request label Jun 27, 2023
@tomaarsen
Copy link
Owner

You're probably right, this would definitely be more efficient. I'll throw it on my todo list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants