Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashPartitioner cast uint32 to int32 #1090

Closed
dynahcatq opened this issue Apr 26, 2018 · 5 comments
Closed

HashPartitioner cast uint32 to int32 #1090

dynahcatq opened this issue Apr 26, 2018 · 5 comments
Labels

Comments

@dynahcatq
Copy link

dynahcatq commented Apr 26, 2018

Versions

Sarama Version: e8552c0 (latest master)
Kafka Version: 1.0.0
Go Version: 1.10

Configuration

Standard Configuration.

Logs

No specific logs.

Problem Description

https://github.com/Shopify/sarama/blob/e8552c0dbf502cd7ec9186b57897e6cf9f15fb34/partitioner.go#L126

The p.hasher.Sum32() is being cast to int32. The default hasher sarama uses is FNV-1. In golang, Sum32() for FNV-1 returns uint32, and it can overflow if cast to int32.

In my case, I'm using custom hash (xxHash32) and with the key="123", the hash value="3062191159" which overflows int32. It would be nice to not have the p.hasher.Sum32() cast to int32.

@eapache
Copy link
Contributor

eapache commented May 3, 2018

It's not clear to me how this is a problem? The partition chosen is still valid, consistent, and well-distributed.

@dynahcatq
Copy link
Author

dynahcatq commented May 3, 2018

The problem is the result is different when integrating with software written in other languages. We have a program that is written in Rust and using Rust Kafka Crate. They are using uint32 for Sum32, so the result is different comparing with golang Sarama.

By casting to signed int32 and making the result non-negative, you are essentially changing the hash algorithm, which is a problem when expecting a hash partitioner to behave the same way across the implementations.

@eapache
Copy link
Contributor

eapache commented May 3, 2018

Ah, this is true... hmm.

The main problem with fixing it is that it's basically a breaking change; we need to maintain the consistency of the hash function between versions for existing users. I think the best we can do is implement a NewUnsignedHashPartitioner or something which people can manually configure.

If we ever do a sarama 2.0 with breaking changes we could fix it for the default at that point. Added to https://github.com/Shopify/sarama/wiki/Ideas-that-will-break-backwards-compatibility

@eapache
Copy link
Contributor

eapache commented Jun 15, 2018

Just to follow up: we've already had to adjust this once for different implementations of the absolute-value function (6967cdb).

I would also note that the HashPartitioner code is fairly simple and the interface is public, so you're free to implement the entirely new partitioner behaviour yourself.

@dynahcatq
Copy link
Author

Thanks for the feedback. We are now using the entirely new custom hash partitioner in our project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants