Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Documentation.md #370

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

tomjridge
Copy link
Contributor

High-level doc of Index functioning


Further, keys must be hashable (with the hash represented as an int), and the implementation even requires that the user specify the number of bits in the hash that are relevant.

FIXME There is a worrying comment regarding the key hash: "underestimation [of the number of relevant bits] will result in undefined behavior"; most code that uses hashes should still work (albeit very slowly) even if all hashes have the same value. So, this comment is a bit unusual. What happens if hashes collide? What do we do about this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must be referring to that part of the doc:

index/src/data.ml

Lines 17 to 22 in fe5e962

val hash_size : int
(** The number of bits necessary to encode the maximum output value of
{!hash}. `Hashtbl.hash` uses 30 bits.
Overestimating the [hash_size] will result in performance drops;
underestimation will result in undefined behavior. *)

hash_size is used by the fan to select the bits in the short hash of a keys, depending on the size of the fan.

Overestimating imply that the fan will refer to bits that don't carry informations (typically zeroes).

Underestimating should be fine in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underestimating causes an undefined behaviour because that means the bits selection drops the MSBs, which breaks the order. Having the bit selection preserving the hash order is an important invariant to the fan-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants