Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix infrequent infinite loop on Mono EventPipe streaming thread.
As observed by dotnet#59296, EventPipe streaming thread could infrequently cause an infinite loop on Mono when cleaning up stack hash map, ep_rt_stack_hash_remove_all called from ep_file_write_sequence_point, flushing buffer memory into file stream. Issue only occurred on Release builds and so far, only observed on OSX, and reproduced in 1 of around 100 runs of the test suite. After debugging the assembler when hitting the hang, it turns out that one item in the hash map has a hash key, that doesn't correspond to its hash bucket, this scenario should not be possible since items get placed into buckets based on hash key value that doesn't change for the lifetime of the item. This indicates that there is some sort of corruption happening to the key, after it has been added to the hash map. After some more instrumentation it turns out that insert into the hash map infrequently triggers a replace, but Mono hash table used in EventPipe is setup to insert without replace, meaning it will keep old key but switch and free old value. Stack has map uses same memory for its key and value, so freeing the old value will also free the key, but since old key is kept, it will point into freed memory and future reuse of that memory region will cause corruption of the hash table key. This scenario should not be possible since EventPipe code will only add to the hash map, if the item is not already in the hash map. After some further investigation it turns out that the call to ep_rt_stack_hash_lookup reports false, while call to ep_rt_stack_hash_add for the same key will hit replace scenario in g_hash_table_insert_replace. g_hash_table_insert_replace finds item in the hash map, using callbacks for hash and equal of hash keys. It turns out that the equal callback is defined to return gboolean, while the callback implementation used in EventPipe is defined to return bool. gboolean is typed as int32_t on Mono and this is the root cause of the complete issue. On optimized OSX build (potential on other platforms) the callback will do a memcmp (updating full eax register) and when returning from callback, callback will only update first byte of eax register to 0/1, keeping upper bits, so if memcmp returns negative value or a positive value bigger than first byte, eax will contains garbage in byte 2, 3 and 4, but since Mono's g_hash_table_insert_replace expects gboolean, it will look at complete eax content meaning if any of the bits in byte 2, 3 or 4 are still set, condition will still be true, even if byte 1 is 0, representing false, incorrectly trigger the replace logic, freeing the old value and key opening up for future corruption of the key, now reference freed memory. Fix is to make sure the callback signatures used with hash map callbacks, match expected signatures of underlying container implementation. Fix also adds a checked build assert into hash map’s add implementation on Mono validating that the added key is not already contained in the hash map enforcing callers to check for existence before calling add on hash map.
- Loading branch information