Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpanGroup(s)-related optimizations #11380

Merged
merged 10 commits into from
Aug 31, 2022

Conversation

shadeMe
Copy link
Contributor

@shadeMe shadeMe commented Aug 25, 2022

Description

We were needlessly performing a serialization-deserialization round-trip just to copy a string Dict of SpanGroups. This PR replaces that with a straight-forward copy of the dict's constituents. To facilitate this, a new keyword argument was added to SpanGroup.copy() that allows the user to pass a new reference document to which the copy is bound.

Improvements in performance can be seen in this PR.

Types of change

Refactor, enhancement

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@shadeMe shadeMe added feat / doc Feature: Doc, Span and Token objects perf / speed Performance: speed v3.5 Related to v3.5 labels Aug 25, 2022
@shadeMe
Copy link
Contributor Author

shadeMe commented Aug 25, 2022

@explosion-bot please test_slow

@explosion-bot
Copy link
Collaborator

explosion-bot commented Aug 25, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/spacy-slow-tests/builds/199

@adrianeboyd
Copy link
Contributor

Where is this kind of copying happening? I don't really understand the need for new_doc?

@shadeMe
Copy link
Contributor Author

shadeMe commented Aug 25, 2022

In SpanGroups.copy() > Doc.copy() > _copy_examples() > Language.evaluate().

Without the new param, the copied SpanGroup instances will point to their original Doc objects. This was not the case previously as the from_/to_bytes() code path initializes the former differently.

@adrianeboyd
Copy link
Contributor

adrianeboyd commented Aug 26, 2022

Ah, I understand. Can you switch this to have the same kwargs format as SpanGroups.copy()?

@adrianeboyd
Copy link
Contributor

I'm trying to think if there's something I've missed in terms of the refcounts, but nothing jumps out at me.

Can you also update span_groups.pyi, the docstrings, and the API docs?

@adrianeboyd adrianeboyd merged commit 604a7c3 into explosion:develop Aug 31, 2022
@shadeMe shadeMe deleted the refactor/span-group-copy branch September 1, 2022 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects perf / speed Performance: speed v3.5 Related to v3.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants