Skip to content

Commit

Permalink
Fix invariant violation in Pony runtime hashes
Browse files Browse the repository at this point in the history
If you were being facetious, you could describe the Pony runtime as
a  series of hashmaps that are held together by some code. Hash
performance  and correctness can have a great impact on everything
else in the runtime because they are at the basis of most everything
else in the runtime.

This change fixes a number of issues that appears to be garbage
collection bugs but were in fact, problems with invariant violation
in the underlying hash implementation. It should be noted that while
the rest of this comment discuss invariant violations that exist in
our Robin Hood hash implementation, some of the bugs that this closes
predate the Robin Hood implementation. This leads me to believe that
the previous implementation had some subtle problem that could occur
under some rare interleaving of operations. How this occurred is
unknown at this time and probably always will be unless someone wants
to go back to the previous version and use what we learned here to
diagnose the state of the code at that time.

This patch closes issues #1781, #1872, and #1483. It's the result of
teamwork amongst myself, Sylvan Clebch and Dipin Hora. History should
show that we were all involved in this resolution.

The skinny:

When garbage collecting items from our hash, that is, removing
deleted  items to free up space, we can end up violating hash
invariants. Previously, one of these invariants was correctly fixed,
however, it incorrectly assumed another invariant held but that is
not the case.

Post garbage collection, if any items have been deleted from our
hash, we do an "optimize" operation on each hash item. We check to
see if the location the item would hash to is now an empty bucket.
If it is, we move the item to that location thereby restoring the
expected chaining. There is, however, a problem with doing this.
It's possible over time to violate another invariant when fixing the
first violation.

For a given item at a given location in the hash, each item has a
probe value. An invariant of our data structure is that items at
earlier locations in the  hash will always have an equal or lower
probe value for that location than items that come later.

For example, items: "foo" and "bar". Given a hashmap whose size is
8, where  "foo" would made to index 1 and "bar" would map to index
"2". When looking at  the probe values for "foo" and "bar" at index
1, "foo" would have a probe  value of "0" as it is at the location
it hashes to whereas "bar" would have a  probe value of "7". The
value is the number of indexes away from our "natural" hash index
that the item is.

When search the hash, we can use this probe value to not do a linear
search of all indexes for the a given key. Once we find an item
whose probe value for a given index is higher than ours, we know
that the key can't be in the map past that index.

Except our course for when we are restoring invariants after a
delete. It's possible, due to the sequential nature of our
"optimize" repair step, to  violate this "always lower probe
value" invariant.

The previous implementation of "optimize_item" assumed that in
invariant held true. By not detecting the invariant violation and
fixing it, we could end up with maps where a key existed in it but
it wouldn't be found. When the map in question was an object map
used to hold gc'able items, this would result in an error that
appears to be a gc error. See #1781, #1872, and #1483.

Closes #1781
Closes #1872
Closes #1483
  • Loading branch information
SeanTAllen committed May 4, 2017
1 parent 9faa938 commit 832d599
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 7 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ All notable changes to the Pony compiler and standard library will be documented

- Compiler error instead of crash for invalid this-dot reference in a trait. ([PR #1879](https://github.com/ponylang/ponyc/pull/1879))
- Compiler error instead of crash for too few args to constructor in case pattern. ([PR #1880](https://github.com/ponylang/ponyc/pull/1880))

- Pony runtime hashmap bug that resulted in issues [#1483](https://github.com/ponylang/ponyc/issues/1483), [#1781](https://github.com/ponylang/ponyc/issues/1781), and [#1872](https://github.com/ponylang/ponyc/issues/1872). ([PR #1886](https://github.com/ponylang/ponyc/pull/1886))

### Added

Expand Down
39 changes: 33 additions & 6 deletions src/libponyrt/ds/hash.c
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ static void resize(hashmap_t* map, cmp_fn cmp, alloc_fn alloc,
static size_t optimize_item(hashmap_t* map, alloc_fn alloc,
free_size_fn fr, cmp_fn cmp, size_t old_index)
{

size_t mask = map->size - 1;

size_t h = map->buckets[old_index].hash;
Expand All @@ -133,17 +132,45 @@ static size_t optimize_item(hashmap_t* map, alloc_fn alloc,
ib_index = index >> HASHMAP_BITMAP_TYPE_BITS;
ib_offset = index & HASHMAP_BITMAP_TYPE_MASK;

// don't need to check probe counts for filled buckets because
// earlier items are guaranteed to have a lower probe count
// than us and we cannot displace them
// found an earlier empty bucket so move item
// Reconstute our invariants
//
// During the process of removing "dead" items from our hash, it is
// possible to violate the invariants of our map. We will no proceed to
// detect and fix those violations.
//
// There are 2 possible invariant violations that we need to handle. One
// is fairly simple, the other rather more complicated
//
// 1. We are no longer at our natural hash location AND that location is
// empty. If that violation were allowed to continue then when searching
// later, we'd find the empty bucket and stop looking for this hashed item.
// Fixing this violation is handled by our `if` statement
//
// 2. Is a bit more complicated and is handled in our `else`
// statement. It's possible while restoring invariants for our most
// important invariant to get violated. That is, that items with a lower
// probe count should appear before those with a higher probe count.
// The else statement checks for this condition and repairs it.
if((map->item_bitmap[ib_index] & ((bitmap_t)1 << ib_offset)) == 0)
{
ponyint_hashmap_clearindex(map, old_index);
ponyint_hashmap_putindex(map, entry, h, cmp, alloc, fr, index);
return 1;
}

else
{
size_t item_probe_length =
get_probe_length(map, h, index, mask);
size_t there_probe_length =
get_probe_length(map, map->buckets[index].hash, index, mask);

if (item_probe_length > there_probe_length)
{
ponyint_hashmap_clearindex(map, old_index);
ponyint_hashmap_putindex(map, entry, h, cmp, alloc, fr, index);
return 1;
}
}
// find next bucket index
index = (index + 1) & mask;
}
Expand Down

0 comments on commit 832d599

Please sign in to comment.