Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashMap performance fix. #394

Merged
merged 5 commits into from
Jun 23, 2022
Merged

HashMap performance fix. #394

merged 5 commits into from
Jun 23, 2022

Conversation

matthewhammer
Copy link
Contributor

@matthewhammer matthewhammer commented Jun 21, 2022

Avoids rehashing when the hash table grows. Fixes one known performance issue.

Background

There are several performance issues with the HashMap that Motoko devs experience when they insert many keys and the table's underlying array re-grows.

Like a usual hash table, each growth step adds an O(n) overhead to what would otherwise be an O(1) insertion operation. But what's worse here, the current HashMap (before this PR) actually re-runs the hash function on every key. For Text keys with even moderate sizes, this has been shown to be prohibitive even at moderate map sizes.

The issue is critical, as it manifests as a failure to insert more keys, as regrowth goes beyond the current message cycle limit. Even with this limit relaxed or removed in the future, it would ideal to avoid this rehashing step. This PR does that.

@matthewhammer matthewhammer requested a review from crusso June 21, 2022 14:02
@matthewhammer matthewhammer marked this pull request as ready for review June 21, 2022 14:04
@matthewhammer matthewhammer enabled auto-merge (squash) June 21, 2022 14:10
Copy link
Contributor

@ggreif ggreif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backward-compatible with the stable vars that are already externalised? @matthewhammer

@matthewhammer
Copy link
Contributor Author

matthewhammer commented Jun 21, 2022

Is this backward-compatible with the stable vars that are already externalised? @matthewhammer

No stable vars will use this type, since HashMap is a class, and not available to be mentioned there. So trivially, yes, this will preserve the types of all stable vars.

Now that you mention it, I think this PR could be considered "improvements only" and should not break anything running today. (Since the class API is not changing, any code using the class remains fine too.)

Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just use a more efficient type for the internal key type definition.

src/HashMap.mo Outdated Show resolved Hide resolved
src/HashMap.mo Show resolved Hide resolved
@matthewhammer
Copy link
Contributor Author

@crusso PTAL.

@matthewhammer matthewhammer merged commit 25831e6 into master Jun 23, 2022
Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@matthewhammer matthewhammer deleted the hashmap-improvements branch June 23, 2022 12:58
@matthewhammer matthewhammer mentioned this pull request Jun 24, 2022
@ByronBecker
Copy link
Contributor

ByronBecker commented Jul 28, 2022

Is this backward-compatible with the stable vars that are already externalised? @matthewhammer

No stable vars will use this type, since HashMap is a class, and not available to be mentioned there. So trivially, yes, this will preserve the types of all stable vars.

Now that you mention it, I think this PR could be considered "improvements only" and should not break anything running today. (Since the class API is not changing, any code using the class remains fine too.)

@matthewhammer @crusso The interface is compatible, but what about the data running underneath the change? I'm thinking specifically for users that are using the system pre/post upgrade functions to serialize/deserialize their HashMap to and from stable memory during upgrades.

Without a specific postUpgrade() that includes the extra key hash step, I'm just trying to understand how this would work.

My naive understanding.

  1. Application is running using old version of HashMap (k is just the key)
  2. Developer changes code locally to use package set or release of motoko-base (unknowingly picks up this new version). Let's say they were using a preUpgrade/postUpgrade to hold the internals of the HashTable class, like:
  table: Hashtable<K, V>;
  _count: Nat;
  1. Developer assumes that nothing happened with how the data is stored in the HashMap data structure, and pushes out a dfx canister install <canister_name> -m upgrade.
  2. At this point, the upgrade should throw some sort of warning stating that the underlying types have changed
// Old
type KVs<K, V> = AssocList.AssocList<K, V>;

// New
type KVs<K, V> = AssocList.AssocList<Key<K>, V>;
  1. Developer in their postUpgrade() now has to reconstruct the hashes for each individual key to match the new type signature.

  2. After making this postUpgrade() addition, the developer again tries the upgrade, but this time continues through the warning (The upgrade either succeeds, or fails due to hitting an upgrade limit b/c of re-hashing every single key - what happens if this is the case?).

  3. The developer then removes the rehashing portion of postUpgrade() method since this is no longer necessary and pushes out an additional upgrade (no warnings).

Usually for breaking changes, libraries provide migration notes - so It might be nice in the documentation to include how one can upgrade from the old version of HashMap to the new version (i.e. include the recommended code for the postUpgrade() code for adapting the underlying table in HashTable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants