Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚸(backend) improve users similarity search and sort results #400

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sampaccoud
Copy link
Contributor

@sampaccoud sampaccoud commented Nov 2, 2024

Purpose

In some edge cases, the domain part the email addresse is longer than the name part. Users searches by email similarity then return a lot of unsorted results.

Proposal

  • We can improve this by being more demanding on similarity when the query looks like an email.
  • Sorting results by the similarity score is also an obvious improvement.

At the moment, we still think it is good to propose results with a weak similarity on the name part because we want to avoid as much as possible creating duplicate users by inviting one of is many emails, a user who is already in our database.

Fixes #399

@sampaccoud sampaccoud self-assigned this Nov 2, 2024
@sampaccoud sampaccoud added enhancement New feature or request python Pull requests that update Python code backend urgent labels Nov 2, 2024
In some edge cases, the domain part the email addresse is
longer than the name part. Users searches by email similarity
then return a lot of unsorted results.

We can improve this by being more demanding on similarity when
the query looks like an email. Sorting results by the similarity
score is also an obvious improvement.

At the moment, we still think it is good to propose results with
a weak similarity on the name part because we want to avoid
as much as possible creating duplicate users by inviting one of
is many emails, a user who is already in our database.

Fixes 399
Copy link
Collaborator

@qbey qbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, we may adapt is again later, because it may be confusing we return other "non related" email address even if the user requested a specific one (eg the michael.johnson@example.gouv.fr test).

It's a nice will to allow user to find "similar" emails, but I'm not sure it will be that easy, for instance

    response = client.get("/api/v1.0/users/?q=ajohnson@other.gouv.fr")

    assert response.status_code == 200
    user_ids = [user["id"] for user in response.json()["results"]]
    assert user_ids == []  # I would have Alice here ^^' but it's too complex

I'm just raising the fact we should consider the tradeoff "user confusion" vs. "don't duplicate users".

src/backend/core/api/viewsets.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend enhancement New feature or request python Pull requests that update Python code urgent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve suggested emails in sharing modal
3 participants