Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exact param in remove doc by word method #41

Merged
merged 2 commits into from
Jul 22, 2022
Merged

Add exact param in remove doc by word method #41

merged 2 commits into from
Jul 22, 2022

Conversation

mateonunez
Copy link
Collaborator

This PR fixes #39 and #40.

I've implemented a new param in the removeDocByWord method used in remove method to prevent removing words that not exactly matches the document's word.

I also added 2 cases to the tests.

@mateonunez mateonunez requested a review from micheleriva July 22, 2022 09:35
@micheleriva micheleriva merged commit 6f665d3 into oramasearch:main Jul 22, 2022
@micheleriva
Copy link
Member

This is perfect, thank you @mateonunez 🙏

@mateonunez mateonunez deleted the fix/remove-doc-by-word branch July 22, 2022 12:14
const id = search.hits[0].id;
await db.delete(id);

const search2 = await db.search({ term: "abc" });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should also use exact: true, otherwise the tests passes even if #40 is not fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you search for abc as an exact term, the result will be 0, 'cause the only document available is the document that contains txt: "abcd".

image

This expected result could be more realistic, but in both cases, it works correctly

Copy link

@lucaong lucaong Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it shouldn't be 0, that's the bug in #40. It should be 1, because there is a second document containing "abc" that should not be deleted.

The test adds two documents containing "abc" and then deletes one of them. The other one should still be there, and not be deleted.

At a glance (but I did not spend much time delving into the code, so I might be wrong), the issue is that the term should not be deleted unless all the documents associated to it were deleted. Instead, the code sets node.end = true, effectively removing the term, even if it still has documents associated to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the point. The issue should be opened again.

const root = this.root;
if (!word) return false;

function removeWord(node: TrieNode, _word: string, docID: string): boolean {
const [nodeWord /**_docs*/] = node.getWord();

if (node.end && nodeWord === word) {
if (node.end || (exact && node.end && nodeWord === word)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fixes #40 . At a glance, #40 is caused by setting end to false without checking if the term is associated to other documents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, #40 was caused because the document was deleted only when the words were the same (implementing exact parameter we don't have that issue).

I think #38 is caused for the reason you said. I'm working on it.

Copy link

@lucaong lucaong Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#38 cannot be caused by that, because it occurs even if no document is deleted, so this code path is never called in the reproduction.


const serach2 = await db.search({ term: "stelle" });

expect(serach2.count).toBe(1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion should check that the search hits do not include the deleted ID, not that the count is 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleted documents still appear in searches
3 participants