Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infinite loop trying to insert a value into a hash set #250

Closed
TylerGlaiel opened this issue Oct 5, 2024 · 10 comments · Fixed by #251
Closed

infinite loop trying to insert a value into a hash set #250

TylerGlaiel opened this issue Oct 5, 2024 · 10 comments · Fixed by #251

Comments

@TylerGlaiel
Copy link

TylerGlaiel commented Oct 5, 2024

following reproduction infinite loops in _find_key when trying to insert "349" into the hash set

the data was saved with phmap_dump in an older version (I believe 1.35?). I dont have the exact sequence of inserts/removes that created this data unfortunately. All I know is this crashes when trying to insert 349. I updated to the current revision and the crash still exists.

infinite loop happens with MSVC when compiling in 64 bit mode

#include "phmap/phmap.h"
#include "phmap/phmap_dump.h"
#include <iostream>

struct PhMapSerializeIn {
	unsigned char* data;
	size_t size;
	size_t current;
	PhMapSerializeIn(unsigned char* p, size_t sz) {
		data = p;
		size = sz;
		current = 0;
	}

	bool loadBinary(void* p, size_t sz) {
		std::memcpy(p, data+current, sz);
		current += sz;
		return true;
	}
};

int main() {
	unsigned char datadump[] = {245, 255, 255, 255, 255, 255, 255, 255, 24, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 254, 60, 45, 88, 254, 32, 15, 46, 99, 44, 0, 126, 254, 27, 254, 125, 67, 54, 254, 81, 5, 62, 106, 93, 66, 21, 63, 49, 22, 128, 254, 255, 254, 60, 45, 88, 254, 32, 15, 46, 99, 44, 0, 126, 254, 27, 254, 125, 85, 1, 0, 0, 0, 0, 0, 0, 42, 1, 0, 0, 0, 0, 0, 0, 8, 1, 0, 0, 0, 0, 0, 0, 87, 1, 0, 0, 0, 0, 0, 0, 35, 1, 0, 0, 0, 0, 0, 0, 78, 1, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 0, 0, 0, 0, 60, 1, 0, 0, 0, 0, 0, 0, 250, 0, 0, 0, 0, 0, 0, 0, 89, 1, 0, 0, 0, 0, 0, 0, 91, 1, 0, 0, 0, 0, 0, 0, 243, 0, 0, 0, 0, 0, 0, 0, 75, 1, 0, 0, 0, 0, 0, 0, 84, 1, 0, 0, 0, 0, 0, 0, 32, 1, 0, 0, 0, 0, 0, 0, 68, 1, 0, 0, 0, 0, 0, 0, 7, 1, 0, 0, 0, 0, 0, 0, 77, 1, 0, 0, 0, 0, 0, 0, 86, 1, 0, 0, 0, 0, 0, 0, 70, 1, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 0, 0, 0, 0, 0, 81, 1, 0, 0, 0, 0, 0, 0, 88, 1, 0, 0, 0, 0, 0, 0, 38, 1, 0, 0, 0, 0, 0, 0, 65, 1, 0, 0, 0, 0, 0, 0, 83, 1, 0, 0, 0, 0, 0, 0, 90, 1, 0, 0, 0, 0, 0, 0, 58, 1, 0, 0, 0, 0, 0, 0, 76, 1, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0};

	int64_t add[] = {348, 349};

	PhMapSerializeIn in(datadump, sizeof(datadump));

	phmap::flat_hash_set<int64_t> myset;
	myset.phmap_load(in);

	for(auto value : add) {
		myset.insert(value);
		std::cout << "inserted: " << value << std::endl;
	}

	for(auto value : myset) {
		std::cout << value << ", ";
	}

	return 0;
}

@greg7mdp
Copy link
Owner

greg7mdp commented Oct 5, 2024

Hum, unfortunately there were a couple of small changes in the phmap_dump format since 1.3.5, so the binary file is probably not compatible with newer versions.
I think your best bet might be to serialize your old file to disk in another format, and then read it a recreate a new phmap_dump file using the new phmap version.

@TylerGlaiel
Copy link
Author

TylerGlaiel commented Oct 5, 2024

the crash existed within 1.3.5 as well. I guess I just want assurances that whatever was causing the crash was fixed between then and now. the crash seems to behave identically with both versions, so I don't think its an issue with the load/dump functionality, seems more like 1.3.5 saved a broken hash map?

(edit: ok the dumped array here was created with current version, let me try to dump one from 1.3.5)

@TylerGlaiel
Copy link
Author

TylerGlaiel commented Oct 5, 2024

yeah its the same bug regardless, looks pretty much the same when stepping through in the debugger no matter which versions I'm using

#include "phmap/phmap.h"
#include "phmap/phmap_dump.h"
#include <iostream>

struct PhMapSerializeIn {
	unsigned char* data;
	size_t size;
	size_t current;
	PhMapSerializeIn(unsigned char* p, size_t sz) {
		data = p;
		size = sz;
		current = 0;
	}

	bool loadBinary(void* p, size_t sz) {
		std::memcpy(p, data+current, sz);
		current += sz;
		return true;
	}
};

int main() {
	//saved with 1.3.5, loaded and dumped with current version
	//unsigned char datadump[]  = {245, 255, 255, 255, 255, 255, 255, 255, 24, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 254, 60, 45, 88, 254, 32, 15, 46, 99, 44, 0, 126, 254, 27, 254, 125, 67, 54, 254, 81, 5, 62, 106, 93, 66, 21, 63, 49, 22, 128, 254, 255, 254, 60, 45, 88, 254, 32, 15, 46, 99, 44, 0, 126, 254, 27, 254, 125, 85, 1, 0, 0, 0, 0, 0, 0, 42, 1, 0, 0, 0, 0, 0, 0, 8, 1, 0, 0, 0, 0, 0, 0, 87, 1, 0, 0, 0, 0, 0, 0, 35, 1, 0, 0, 0, 0, 0, 0, 78, 1, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 0, 0, 0, 0, 60, 1, 0, 0, 0, 0, 0, 0, 250, 0, 0, 0, 0, 0, 0, 0, 89, 1, 0, 0, 0, 0, 0, 0, 91, 1, 0, 0, 0, 0, 0, 0, 243, 0, 0, 0, 0, 0, 0, 0, 75, 1, 0, 0, 0, 0, 0, 0, 84, 1, 0, 0, 0, 0, 0, 0, 32, 1, 0, 0, 0, 0, 0, 0, 68, 1, 0, 0, 0, 0, 0, 0, 7, 1, 0, 0, 0, 0, 0, 0, 77, 1, 0, 0, 0, 0, 0, 0, 86, 1, 0, 0, 0, 0, 0, 0, 70, 1, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 0, 0, 0, 0, 0, 81, 1, 0, 0, 0, 0, 0, 0, 88, 1, 0, 0, 0, 0, 0, 0, 38, 1, 0, 0, 0, 0, 0, 0, 65, 1, 0, 0, 0, 0, 0, 0, 83, 1, 0, 0, 0, 0, 0, 0, 90, 1, 0, 0, 0, 0, 0, 0, 58, 1, 0, 0, 0, 0, 0, 0, 76, 1, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0};
	
	//saved with 1.3.5, loaded with 1.3.5 and dumped with 1.3.5
	unsigned char datadump[] = {24, 0, 0, 0, 0, 0, 0, 0, 31, 0, 0, 0, 0, 0, 0, 0, 254, 60, 45, 88, 254, 32, 15, 46, 99, 44, 0, 126, 254, 27, 254, 125, 67, 54, 254, 81, 5, 62, 106, 93, 66, 21, 63, 49, 22, 128, 254, 255, 254, 60, 45, 88, 254, 32, 15, 46, 99, 44, 0, 126, 254, 27, 254, 125, 85, 1, 0, 0, 0, 0, 0, 0, 42, 1, 0, 0, 0, 0, 0, 0, 8, 1, 0, 0, 0, 0, 0, 0, 87, 1, 0, 0, 0, 0, 0, 0, 35, 1, 0, 0, 0, 0, 0, 0, 78, 1, 0, 0, 0, 0, 0, 0, 248, 0, 0, 0, 0, 0, 0, 0, 60, 1, 0, 0, 0, 0, 0, 0, 250, 0, 0, 0, 0, 0, 0, 0, 89, 1, 0, 0, 0, 0, 0, 0, 91, 1, 0, 0, 0, 0, 0, 0, 243, 0, 0, 0, 0, 0, 0, 0, 75, 1, 0, 0, 0, 0, 0, 0, 84, 1, 0, 0, 0, 0, 0, 0, 32, 1, 0, 0, 0, 0, 0, 0, 68, 1, 0, 0, 0, 0, 0, 0, 7, 1, 0, 0, 0, 0, 0, 0, 77, 1, 0, 0, 0, 0, 0, 0, 86, 1, 0, 0, 0, 0, 0, 0, 70, 1, 0, 0, 0, 0, 0, 0, 46, 0, 0, 0, 0, 0, 0, 0, 55, 0, 0, 0, 0, 0, 0, 0, 134, 0, 0, 0, 0, 0, 0, 0, 81, 1, 0, 0, 0, 0, 0, 0, 88, 1, 0, 0, 0, 0, 0, 0, 38, 1, 0, 0, 0, 0, 0, 0, 65, 1, 0, 0, 0, 0, 0, 0, 83, 1, 0, 0, 0, 0, 0, 0, 90, 1, 0, 0, 0, 0, 0, 0, 58, 1, 0, 0, 0, 0, 0, 0, 76, 1, 0, 0, 0, 0, 0, 0};
	int64_t add[] = {348, 349};

	PhMapSerializeIn in(datadump, sizeof(datadump));

	phmap::flat_hash_set<int64_t> myset;
	myset.phmap_load(in);

	for(auto value : add) {
		myset.insert(value);
		std::cout << "inserted: " << value << std::endl;
	}

	for(auto value : myset) {
		std::cout << value << ", ";
	}

	return 0;
}

@TylerGlaiel
Copy link
Author

what ends up happening is there aren't any "empty" values left in the hash set (although there are plenty of "deleted" sentinels) so this loop never ends

image

@greg7mdp
Copy link
Owner

greg7mdp commented Oct 5, 2024

I don't know what datadump's data come from, so I don't see how this demonstrates a phmap problem (unless I'm missing something, if so please let me know).

If you can provide an example of populating a hash map normally, dumping it, and being unable to load it, I'll certainly look into it.

@TylerGlaiel
Copy link
Author

TylerGlaiel commented Oct 5, 2024

datadump was a flat_hash_set that was saved with phmap_dump in version 1.3.5. I have no way to reconstruct the sequence of adds/removes that created that initial hash set, this is the only way I can reproduce the bug. I just converted it to bytes so I could store it in the reproduction like that. The hash set was only ever created by adding and removing int64_t's to the set (single threaded), and eventually one user reported a crash here, and I converted their saved hash set into that byte sequence for the reproduction. I dont think the dump/load is a relevant part of the bug here, its only there cause I cant reconstruct this set otherwise

@greg7mdp
Copy link
Owner

greg7mdp commented Oct 5, 2024

You are probably hitting this issue which us now fixed (since 1.3.12)

@TylerGlaiel
Copy link
Author

ok I think this is actually what the issue was, and it looks like it was fixed at some point between 1.3.5 and current version,
8c8550d

image
it should probably reconstruct growth_left() if its loading a dump from an older version though right? the versioning seems to work fine for loading it otherwise

@TylerGlaiel
Copy link
Author

ex add this
image

this would make updating the lib to fix the crash be a bit smoother so I dont have to reconstruct the hash sets on my end

@greg7mdp
Copy link
Owner

greg7mdp commented Oct 5, 2024

ex add this

Awesome, give me a sec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants