Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

put(Object obj) with type checking and instance walking #3

Open
RichMacDonald opened this issue Oct 19, 2024 · 0 comments
Open

put(Object obj) with type checking and instance walking #3

RichMacDonald opened this issue Oct 19, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@RichMacDonald
Copy link

New user here. Thank you for the library. Not sure if this is useful to anyone, but I thought you might like feedback on other uses of the code:

The first thing I wanted to do with it was add the ability to hash an "almost arbitrary" Object with "minimal" HashFunnel. Where "almost arbitrary" means consisting of primitives, Collections, etc. Ideally, I wanted a put(Object obj) without a HashFunnel. This method would check the type of the argument and call the corresponding method. And it would walk arrays, collections and maps recursively. And when it encounter a class it did not know how to deal with, then hash the classname and the toString() method. I came up with:

/**
 * Designed for primitive objects and arrays/maps/collections of primitives.
 * If we fail to recognize the types of instance, we use toString()
 */
public HashStream128 putObject(@Nullable Object obj) {
	if (obj == null) {
		putNull();

	} else if (obj.getClass().isArray()) {
		putArrayPriv(obj);

	} else if ((obj instanceof List)) {
		putList((List<?>) obj);

	} else if ((obj instanceof Set)) {
		putSet((Set<?>) obj);

	} else if ((obj instanceof Map)) {
		putMap((Map<?, ?>) obj);

	} else if (obj instanceof String str) {
		putString(str);

	} else if (obj instanceof Integer int_) {
		putInt(int_.intValue());

	} else if (obj instanceof Long lng) {
		putLong(lng.longValue());

	} else if (obj instanceof Boolean bool) {
		putBoolean(bool.booleanValue());

	} else if (obj instanceof Double dbl) {
		putDouble(dbl.doubleValue());

	} else if (obj instanceof Enum enum_) {
		putString(enum_.getClass().getCanonicalName());
		putString(enum_.name());

	} else if (obj instanceof Short shrt) {
		putShort(shrt.shortValue());

	} else if (obj instanceof Character chr) {
		putChar(chr.charValue());

	} else if (obj instanceof Byte objByte) {
		putByte(objByte.byteValue());

	} else if (obj instanceof Float flt) {
		putFloat(flt.floatValue());

	} else if (obj instanceof UUID uuid) {
		putUUID(uuid);

	} else {
		putString(obj.getClass().getCanonicalName());
		putString(obj.toString());
	}

	return this;
}

/**
 * Want this to work for both Object[] and primitive arrays, so we take the argument as an Object type and not an array type
 */
public HashStream128 putArray(Object obj) {
	if (obj.getClass().isArray()) {
		return putArrayPriv(obj);
	}
	throw new IllegalArgumentException("Object is not an array: class=" + obj.getClass());
}

/**
 * We know the argument is an array, but it has to be declared as an Object, in order to handle objects and primitives.
 * Have to use reflection to access the items because we cannot cast to (Object[]).
 * Note that this method causes autoboxing, so a potential issue in high performance situations.
 * Alternative is to write more case statements to handle each type of primitive.
 */
private HashStream128 putArrayPriv(Object arr) {
	int len = Array.getLength(arr);
	for (int i = 0; i < len; i++) {
		putObject(Array.get(arr, i));
	}
	return this;
}

public HashStream128 putNull() {
	// copy code from AbstractHashStream
	putBoolean(false);
	return this;
}

public HashStream128 putList(List<?> coll) {
	int counter = 0;
	for (Object object : coll) {
		putObject(object);
		counter++;
	}
	putInt(counter);
	return this;
}

public void putSet(Set<?> set) {
	// Need an elementHashFunction.
	// That requires access to the Hasher from within the HashStream, which isn't available.
	// Clone and reset the sink instead.
	// Assuming this is thread-safe.

	final HashStream128 hashStream = copy();

	ToLongFunction<Object> elementHashFunction = obj -> {
		hashStream.reset();
		putObject(obj);
		return hashStream.getAsLong();
	};

	putUnorderedIterable(set, elementHashFunction);
}

public HashStream128 putMap(Map<?, ?> map) {
	int counter = 0;
	for (Entry<?, ?> entry : map.entrySet()) {
		putObject(entry.getKey());
		putObject(entry.getValue());
		counter++;
	}
	putInt(counter);
	return this;
}

Generalizing this code so it can be written once and used by all the cases (HashStream38/64/128) identified a few issues:

  1. The HashSink does not define copy() and reset(). So HashSink cannot be used as the "base class". Instead, we have to use HashStream32.
  2. HashStream32 does not have a getAsLong() method, so the putSet() method needs to check the class and convert accordingly. But it is simple.
  3. I didn't have access to the Hasher from within the HashStream so I had to clone the current HashStream in the putSet() method.

The biggest problem with this code is that I cannot get it into the main classes, so I have to write "delegate code" with my own classes. This isn't acceptable in the long run. If this code can't get into the main branch, forget it.

I also wrote the same code using the HashFunnel interface. I could have used a series of if-then-else code for the type checking but I have a utility class I call ClassMap that implements HashMap where the key is a class or interface and the value is a Consumer. This code is shown below:

/**
* Accept Object type and check the type to call the correct method.
* Recurse on Map, List and Set.
* When a non-primitive Object is encountered, hash the classname and the toString() function
*/
public class ObjectHashFunnel implements HashFunnel<Object> {

private final ClassMap<BiConsumer<Object, HashSink>> funnelMap = ClassMap.forClassesAndInterfaces(); //code not shown

public ObjectHashFunnel() {

	funnelMap.put(Byte.TYPE, (byte_, sink) -> sink.putByte((byte) byte_));
	funnelMap.put(byte[].class, (bytes, sink) -> sink.putBytes((byte[]) bytes));

	funnelMap.put(Boolean.TYPE, (boolean_, sink) -> sink.putBoolean((boolean) boolean_));
	funnelMap.put(boolean[].class, (booleans, sink) -> sink.putBooleans((boolean[]) booleans));

	funnelMap.put(Short.TYPE, (short_, sink) -> sink.putShort((short) short_));
	funnelMap.put(short[].class, (shorts, sink) -> sink.putShorts((short[]) shorts));

	funnelMap.put(Character.TYPE, (char_, sink) -> sink.putChar((char) char_));
	funnelMap.put(char[].class, (chars, sink) -> sink.putChars((char[]) chars));

	funnelMap.put(String.class, (str, sink) -> sink.putString((String) str));
	funnelMap.put(String[].class, (strs, sink) -> putStrings((String[]) strs, sink));

	funnelMap.put(Integer.TYPE, (int_, sink) -> sink.putInt((int) int_));
	funnelMap.put(int[].class, (ints, sink) -> sink.putInts((int[]) ints));

	funnelMap.put(Long.TYPE, (long_, sink) -> sink.putLong((long) long_));
	funnelMap.put(long[].class, (longs, sink) -> sink.putLongs((long[]) longs));

	funnelMap.put(Float.TYPE, (float_, sink) -> sink.putFloat((float) float_));
	funnelMap.put(float[].class, (floats, sink) -> sink.putFloats((float[]) floats));

	funnelMap.put(Double.TYPE, (double_, sink) -> sink.putDouble((double) double_));
	funnelMap.put(double[].class, (doubles, sink) -> sink.putDoubles((double[]) doubles));

	funnelMap.put(UUID.class, (uuid, sink) -> sink.putUUID((UUID) uuid));

	funnelMap.put(Object[].class, (arr, sink) -> putObjectArray((Object[]) arr, sink));

	funnelMap.put(List.class, (list, sink) -> sink.putOrderedIterable((List) list, this));

	funnelMap.put(Set.class, (set, sink) -> putSet((Set) set, sink));

	funnelMap.put(Map.class, (map, sink) -> putMap((Map) map, sink));
}

public <T> void addHashFunnel(Class<T> klass, HashFunnel<T> funnel) {
	BiConsumer<Object, HashSink> biCons = (obj, sink) -> sink.put((T) obj, funnel);
	funnelMap.put(klass, biCons);
}

@Override
public void put(@Nullable Object obj, HashSink sink) {
	if (obj == null) {
		sink.putBoolean(false);
		return;
	}

	BiConsumer<Object, HashSink> cons = funnelMap.get(obj.getClass());
	if (cons != null) {
		cons.accept(obj, sink);
		return;
	}

	// No match, so default to the classname and the toString() value.
	sink.putString(obj.getClass().getCanonicalName());
	sink.putString(obj.toString());
}

private void putStrings(String[] strs, HashSink sink) {
	for (String string : strs) {
		sink.putString(string);
	}
}

private void putMap(Map<?, ?> map, HashSink sink) {

	if (map.isEmpty()) {
		return;
	}

	int counter = 0;
	for (Entry<?, ?> entry : map.entrySet()) {
		sink.put(entry.getKey(), this);
		sink.put(entry.getValue(), this);
		counter++;
	}
	sink.putInt(counter);
}

private void putObjectArray(Object[] arr, HashSink sink) {
	for (Object item : arr) {
		sink.put(item, this);
	}
}

private void putSet(Set<?> set, HashSink sink) {
	// Need an elementHashFunction.
	// That requires access to the Hasher from within the HashStream, which isn't available.
	// Clone and reset the sink instead.
	// Requiring this be thread-safe.

	if (set.isEmpty()) {
		return;
	}

	final HashStream64 hasStream = ((HashStream64) sink).copy();

	ToLongFunction<Object> elementHashFunction = obj -> {
		hasStream.reset();
		hasStream.put(obj, this);
		return hasStream.getAsLong();
	};

	sink.putUnorderedIterable(set, elementHashFunction);
}

Here is a simple test to exercise the code:

static class CustomObject{
	public int testInt;
}

@Test
public void hashObjectFunnel() throws IOException {
	ObjectHashFunnel funnel = new ObjectHashFunnel();
	HashFunnel<CustomObject> customFunnel = (customObj, sink) -> sink.putInt(customObj.testInt);
	funnel.addHashFunnel(CustomObject.class, customFunnel);
	

	List<Object> list = new ArrayList<>();
	list.add(10);
	list.add(11.0);
	list.add("twelve");
	list.add(Instant.ofEpochSecond(1));

	Set<Object> set = new HashSet<>();
	set.add(10);
	set.add(11.0);
	set.add("twelve");
	set.add(Instant.ofEpochSecond(1));

	Map<Object,Object> map2 = new HashMap<>();
	map2.put("2nd", 65);

	CustomObject custom = new CustomObject();
	custom.testInt = 50;

	Map<Object,Object> testMap = new HashMap<>();
	testMap.put("null", null);
	testMap.put("double-array", new double[] {1.0, 3.0});
	testMap.put("string","sldifkjgh");
	testMap.put("list",list);
	testMap.put("set",set);
	testMap.put(map2,map2);
	testMap.put("int", 1);
	testMap.put("lng", 1L);
	testMap.put("bool", true);
	testMap.put("bool2", new Boolean(true));
	testMap.put("dbl", 5.0d);
	testMap.put("enum", RoundingMode.DOWN);
	testMap.put("short", Short.valueOf("5"));
	testMap.put("char", 'd');
	testMap.put("byte", Byte.valueOf("0"));
	testMap.put("float", Float.valueOf("5.0"));

	HashStream64 hasher = Hashing.xxh3_64().hashStream();
	hasher.put(testMap, funnel);
	assertThat(hasher.getAsLong()).isEqualTo(-2953010768599763332L);
}

I have done no performance tests or looked for memory leaks. Have you done any performance comparisons with the Objects.hash() method?

@RichMacDonald RichMacDonald added the enhancement New feature or request label Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant