Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch loading older TfidfModels without smartirs #2559

Merged
merged 4 commits into from
Aug 26, 2019

Conversation

bnomis
Copy link
Contributor

@bnomis bnomis commented Jul 23, 2019

Older pre-3.8.0 TfidfModels do not have the smartirs attribute. But the 3.8.0 code assumes the smartirs attribute exists. For exampe in getitem(). This causes a crash when using loaded models that were saved with older versions of gensim. This PR adds code to the load() method to set smartirs to None if it is not set via the loaded model.

@piskvorky
Copy link
Owner

piskvorky commented Jul 23, 2019

Thanks! CC @Witiko can you review please?

@Witiko
Copy link
Contributor

Witiko commented Aug 12, 2019

@piskvorky Sorry for the late response, I was on a vacation. The pull request LGTM.

@piskvorky piskvorky requested a review from mpenkov August 12, 2019 07:22
@Witiko
Copy link
Contributor

Witiko commented Aug 12, 2019

Perhaps we could add a unit test that tests the loading of a pre-3.8.0 TfidfModel, @bnomis?

@bnomis
Copy link
Contributor Author

bnomis commented Aug 13, 2019

@Witiko Sure. I can add a unit test. I'd like to confirm the correct way: I'd make a model with 3.7 save as a binary in gensim/test/test_data and write the test of loading that model. Correct?

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we need tests to cover the new functionality (e.g. loading an old model without smartirs and showing that it still works correctly).

gensim/models/tfidfmodel.py Show resolved Hide resolved
@Witiko
Copy link
Contributor

Witiko commented Aug 14, 2019

I'd like to confirm the correct way: I'd make a model with 3.7 save as a binary in gensim/test/test_data and write the test of loading that model. Correct?

@bnomis There already exist such models (for example tfidf_model.tst). Your test would load the model and call __getitem__. Without your patch, this should raise an AttributeError.

@bnomis
Copy link
Contributor Author

bnomis commented Aug 18, 2019

@Witiko I've updated my pull request to include a test for backward compatibility. The tfidf_model.tst test model has the smartirs attribute. So does not show the issue. Therefore, I added a test tfidf model made with version 3.2.0 to have a file to load and test against.

@Witiko
Copy link
Contributor

Witiko commented Aug 19, 2019

Looks ok. The test on lines 444–447 can be simplified to self.assertEqual(len(model[corpus]), len(corpus)).

@bnomis
Copy link
Contributor Author

bnomis commented Aug 19, 2019

@Witiko Updated

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 23, 2019

@bnomis @Witiko This looks like it's ready to merge. Can you please let me know if this is so?

@Witiko
Copy link
Contributor

Witiko commented Aug 23, 2019

Looks good to me.

@mpenkov mpenkov merged commit decd4a0 into piskvorky:develop Aug 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants