Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence labeling refactoring #2361

Merged
merged 69 commits into from
Dec 16, 2021
Merged

Sequence labeling refactoring #2361

merged 69 commits into from
Dec 16, 2021

Conversation

whoisjones
Copy link
Member

Closes #2360.

@helpmefindaname
Copy link
Collaborator

@whoisjones is there any status update on this?
if not, do you mind me creating a PR based on this?

@whoisjones
Copy link
Member Author

@helpmefindaname currently shifting the sequence labeler below the DefaultClassifier. We still need a parser for previous models, so it still takes some days, but feel free to contribute on this branch.


from .sequence_tagger_utils.crf import CRF
from .sequence_tagger_utils.viterbi import ViterbiLoss, ViterbiDecoder
from ..datasets import DataLoader, SentenceDataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these relative module paths? why not flair.datasets?

pad_start_tags = torch.cat([start, tags], 1)
pad_stop_tags = torch.cat([tags, stop], 1)
# filter empty sentences
if isinstance(sentences[0], Sentence):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if check souldn't be required, as sentences is always of type List[Sentence] if typing isn't violated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as yes, right! I'll push a correction, thanks!

for i in range(len(lens_)):
pad_stop_tags[i, lens_[i] :] = self.tag_dictionary.get_idx_for_item(STOP_TAG)
# order by length
reordered_sentences: List[Union[Sentence, str]] = sorted(sentences, key=lambda s: len(s), reverse=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Union doesn't make sense, as sentences is of type List[Sentence] we will always have reordered_sentences: List[Sentence] also, I think mypy is able to auto-infer the type of reordered_sentences so the typing might not be necessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a good point, this is I think a leftover from times when str could also be passed to the predict function!

@alanakbik alanakbik merged commit 1d65cf4 into master Dec 16, 2021
@alanakbik alanakbik deleted the sequence_labeling_refactoring branch December 16, 2021 13:16
@alanakbik
Copy link
Collaborator

@whoisjones thanks a lot for improving this!

@mauryaland mauryaland mentioned this pull request Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactoring of Sequence Tagger Class
3 participants