Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

people auto complete: Send user ID for users having same full name. #4094

Merged
merged 1 commit into from
Jul 2, 2020

Conversation

agrawal-d
Copy link
Member

@agrawal-d agrawal-d commented May 7, 2020

Previously, our @-mention autocomplete had a bug, wherein it
failed to correctly mention the correct user if more than one user
had the same full name.

Now, if more than one users have the same full name, we now send
the user ID as well to ensure that the correct user is
@-mentioned.

@agrawal-d
Copy link
Member Author

@chrisbobbe, @gnprice, @ray-kraesig this PR is ready for a review.

@agrawal-d agrawal-d force-pushed the mentions branch 2 times, most recently from fcf4f0b to 5827769 Compare May 7, 2020 11:55
@agrawal-d agrawal-d added the a-compose/send Compose box, autocomplete, camera/upload, outbox, sending label May 7, 2020
@@ -32,6 +32,12 @@ class PeopleAutocomplete extends PureComponent<Props> {
const { users, onAutocomplete } = this.props;
const user = users.find(x => x.email === email);
if (user) {
// If another user with the same full name is found, we send the
// user ID as well, to ensure the mentioned user is uniquely identified.
if (users.find(x => x.full_name === user.full_name && x.user_id !== user.user_id)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this will look different when #3339 is resolved, but it looks good to me as a fix before then.

@agrawal-d
Copy link
Member Author

agrawal-d commented May 8, 2020

@chrisbobbe I've pushed some changes.

This version uses binary search and is faster than the linear one.
On CZO, the linear scan takes ~18ms or more.
This version takes 1ms or less.

@chrisbobbe
Copy link
Contributor

chrisbobbe commented May 14, 2020

Hmm, interesting! I actually didn't have binary search in mind when I commented at #4094 (comment).

Instead, I was thinking that we wouldn't be working with arrays here anymore, and instead something like an Immutable.Map (now much more possible with #4047 (comment) merged).

At #3339, @gnprice said:

Instead, we should make these lookups through efficient, indexed data structures. A JS object works great. A Map works equally great, and can be a bit cleaner in the code.

However! The natural thing to do for #3339, to get users using an "efficient, indexed data structure" is, I think to do an Immutable.Map where keys are user IDs — not full names, which is what we care about here. So unless we have a separate index for full names, we're kind of worse off than we are with an array; we'd be awkwardly (and still linearly) traversing all entries in the user ID-keyed Map.

One solution, as you've shown here, is to use binary search on an array. (One step that I think is missing currently is to ensure users is sorted by full_name before passing it to binarySearch.) Better than linear, it's true. But a lookup in an Immutable.Map, as I understand it, is near-constant.

@gnprice, as someone who used to do research on algorithms, will know what to do here. But the one thing that hasn't come up in my discussions with him is the need to maintain indexes on things besides a canonical, unique ID.

@agrawal-d
Copy link
Member Author

agrawal-d commented May 15, 2020

Thanks for taking a look @chrisbobbe

Instead, I was thinking that we wouldn't be working with arrays here anymore, and instead something like an Immutable.Map (now much more possible with #4047 (comment) merged).

Yep, I did consider maps before. But JavaScript does not support multiMaps, so if we attempt to hash with full_name as keys, the last value will be overwritten. So we can't use a Map in this use case, as you explained above. That is why I implemented a binary search.

And if we hash with user_id as keys, the map iteration will be O(n). So I thought of using binary search, as users sorted by full_name is already available, thus saving the O(nlogn) overhead of sorting it.

One step that I think is missing currently is to ensure users is sorted by full_name before passing it to binarySearch

users is sorted:

users: getSortedUsers(state),

Copy link
Contributor

@rk-for-zulip rk-for-zulip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrawal-d wrote:

Yep, I did consider maps before. But JavaScript does not support multiMaps, so if we attempt to hash with full_name as keys, the last value will be overwritten.

Disregarding possible variations in their definition of "insertion order" – which we're not concerned with – Multimap<K, V> is isomorphic to Map<K, Array<V>>.

But, as you note, we already have a sorted list.


The usual way to do this, I think, would be lower_bound and upper_bound functions, taking a candidate value k and returning the index to the first key not less than k (for lower_bound) or greater than k (for upper_bound). (These functions can be implemented almost identically to binary search, except that they should treat equality as the same as greater-than or less-than (respectively) when dispatching, and they should always return something.)

If you have those, const lo = foo.lower_bound(...), hi = foo.upper_bound(...) gives exactly the range [lo, hi) of values with keys equivalent to k.

However, we don't happen to have those handy, and they're not really necessary here. You can instead just do a binary search to find any matching user, and then check if either the preceding or following users' name also matches.

Comment on lines 8 to 15
const comparator = (element: number) => {
if (element < toFind) {
return -1;
} else if (element > toFind) {
return 1;
}
return 0;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contextually this is equivalent to element - toFind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

*/
const binarySearch = <T>(
array: Array<T>,
comparator: (elem: T) => number,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this to be identical to what you'd pass to sort, accepting the value to be found as another parameter.

This is technically less flexible (for example, as written, you could pass a comparator that would find and accept any value starting with "M"), so I could understand supplying an auxiliary function to build the search-comparator from a sort-comparator and a value instead.

(EDIT: This was written before I saw the next commit.)

Comment on lines 36 to 41
const idx = binarySearch(users, elem => {
if (elem.full_name === user.full_name && elem.user_id === user.user_id) {
return 1;
}
return elem.full_name.toLowerCase().localeCompare(user.full_name.toLowerCase());
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work: the given comparator doesn't describe a consistent ordering.

Consider user = { full_name: "John Doe", user_id: 4, ... }, and a users array matching the following:

users =~ [
  { full_name: "A", user_id: 0 }, // -1
  { full_name: "B", user_id: 1 }, // -1
  { full_name: "C", user_id: 2 }, // -1
  { full_name: "D", user_id: 3 }, // -1
  { full_name: "John Doe", user_id: 4 }, // 1
  { full_name: "John Doe", user_id: 5 }, // 0
  { full_name: "X", user_id: 6 }, // 1
  { full_name: "Y", user_id: 7 }, // 1
  { full_name: "Z", user_id: 8 }, // 1
];

(Trailing comments show the expected return value of the comparator for each element.)

binary_search will check users[4] first, which is the user in question; but the comparator will return 1, so it will limit further searches to the range [0, 3], and therefore never find { full_name: "John Doe", user_id: 5 }.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

@agrawal-d agrawal-d force-pushed the mentions branch 2 times, most recently from 0873981 to d9c6ddb Compare May 26, 2020 06:14
@agrawal-d
Copy link
Member Author

@rk-for-zulip thanks for the review. I've made some changes to fix the issues you pointed out. Can you take a look again?

@gnprice
Copy link
Member

gnprice commented May 26, 2020

Seeing this add an implementaion of binary search, my first reaction is that binary search is notoriously tricky to implement completely correctly. (Perhaps I'll go make a thread in #learning about that story -- it might be fun to read about. edit: here)

So I'm inclined to look for other solutions for that reason.

From an algorithmic perspective, one thing I notice here is that this is one of several places that we do a linear pass through this same array. If you look at getAutocompleteSuggestion, which is called in the render method of this component:

  const allAutocompleteOptions = getUsersAndWildcards(users);
  const startWith = filterUserStartWith(allAutocompleteOptions, filter, ownEmail);
  const initials = filterUserByInitials(allAutocompleteOptions, filter, ownEmail);
  const contains = filterUserThatContains(allAutocompleteOptions, filter, ownEmail);
  const matchesEmail = filterUserMatchesEmail(users, filter, ownEmail);

Each of those five lines represents a linear pass through all this data. The first one, in fact, actually allocates a copy:

export const getUsersAndWildcards = (users: User[]) => [
  { ...NULL_USER, full_name: 'all', email: '(Notify everyone)' },
  { ...NULL_USER, full_name: 'everyone', email: '(Notify everyone)' },
  ...users,
];

which tends to be especially expensive.

So all of those together are going to do much more work than we'd do by a linear search here. In algorithmic terms, that means the most we could gain by doing something fancier is a constant factor -- O(n) + O(log n) is still O(n), just like O(n) + O(n) is.

And actually the situation is sharper than that -- we're doing all that work every time the user types another letter, to search again with the new query. This code, on the other hand, only runs once at the end when the user chooses an item from the list.

So, let's just not do anything fancy here for now. 🙂

I do find, empirically on czo, that the people autocomplete can be pretty slow to respond with results. That would be good to improve, and it might involve using some more-clever algorithms and/or data structures in getAutocompleteSuggestion and its helpers. Happy to talk more about that on #mobile-team, if you're interested.

@agrawal-d
Copy link
Member Author

@gnprice Thanks for the review, I've reverted it back to the simple version. Can you take a look again?

CC @chrisbobbe

@chrisbobbe
Copy link
Contributor

chrisbobbe commented May 29, 2020

Great, thanks! This looks good to me, though it just occurred to me that it would be good to refer to an authoritative source on the mention syntax, in the code comment you've added. I believe the get_mention_syntax function in static/js/people.js is what we want here.

@agrawal-d
Copy link
Member Author

@chrisbobbe I've updated the commit message.

Previously, our @-mention autocomplete had a bug, wherein it
failed to correctly mention the specific user if more than one
users had the same full name.

Now, if more than one users have the same full name, we send the
user ID as well to ensure that the correct user is @-mentioned.

The user ID is not sent for every @-mention to ensure the
behaviour is consistent with the web app, i.e., we send it
only in case of ambiguity.

See the `get_mention_syntax` function in `static/js/people.js`
in the webapp source for the origin of this syntax.
@chrisbobbe
Copy link
Contributor

Merged, thanks, @agrawal-d! I also made a small tweak: adding the reference to the web app in a code comment, in addition to the commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-compose/send Compose box, autocomplete, camera/upload, outbox, sending
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants