Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add optional hashing functions for different experimental settings #464

Closed
victor-mariano-leite opened this issue Aug 16, 2021 · 6 comments

Comments

@victor-mariano-leite
Copy link

victor-mariano-leite commented Aug 16, 2021

The current implementation of Flagr seems to use a CRC32 mapping to generate the entities hash, but as far as I know, CRC32 is commonly used for other purposes instead of hashing for A/B tests since as the randomization units scale, the likelihood of the collisions increase more than MD5 for example, potentially generating sample ratio mismatches.

To validate this scenario, I've gathered a sample experiment in my company.

First, we've wanted to create unique tests, isolating entities between variants inside a flag this way avoiding confounding effects over experiments, although as far as I know, there is no isolation between flags, which is a problem we couldn't solve effectively yet. Seems to us, that the current architecture more appropriate for multivariate experiment, instead of A/B testing.

To do this, we've been creating one flag per feature and N variants from the start, to avoid re-allocation of entities when we create a new variant. Else, a user in Control could go to Treatment, a behavior we validated creating A/Control variant, introduced a B variant later, and saw that some users from A went to B. So when we are A/B testing in one feature there are 20 variants in the flag, with only 2 variants turned on, Control and Treatment, and possibly an Out-Of-Test variant.

We store all of Flagr data on our Data Lake, so I've gathered the sample ratio of a particular experiment (Control/Treatment), and running a Qui-Square test, seems that there is a mismatch on the sample sizes that is not at random.

I suppose this is because of CRC32, but I'm not sure, is there any way to validate this more consistently on Flagr?

I've seen MD5 or a Jenkins Hash Function are implemented to assign units to it's variants, since they are collision resistant.

Anyway, for flexibility and more general use cases, it would be interesting if we could to choose the randomization algorithm, right-sizing it for ones use cases.

@zhouzhuojie
Copy link
Collaborator

The closest research I can find is https://michiel.buddingh.eu/distribution-of-hash-values#summary, which has a summary section says CRC32 is a good choice:

image

The value space of CRC32 is smaller than MD5 or other crypto hashing functions. But collision doesn't affect its distribution as demonstrated by the experiments by the article. In fact, the maximum buckets Flagr supports is 1000, which is significantly smaller than CRC32's range, in an ideal world when the input entity_id has enough entropy, the actual distribution will approximate the distribution you set in the segment. CRC32 is way faster than other crypto hashing functions with native cpu instructions.

That said, I agree there's a flexibility need to run different hash functions, and there's room to provide more experimentation results on various input.

@zhouzhuojie
Copy link
Collaborator

@victor-mariano-leite I added a test in the openflagr repo to verify my hypothesis #35

I would also check the distribution of entity_id from the input data. Due to the deterministic hashing of entity_id, an extrieme example is that if you pass 60% 0 and 40% 1 as the entity_id and expect a 50%/50% split, that's not how it works.

@victor-mariano-leite
Copy link
Author

victor-mariano-leite commented Aug 20, 2021

@zhouzhuojie

Nice! And interesting point, is it a bad practice to use sequential ids (such as autogenerated SQL ids) as the entity_id?

That is our case, and I was thinking that if the likelihood of older users to be assigned to an experiment is higher (since of our trigger to assign the user to the variant bucket is more commonly used by retained users), maybe the split is biased there also.

@victor-mariano-leite
Copy link
Author

I've tried plotting the distribution of the entity_id's by various variant_id's, some experiments looks like quite right skewed:

image

As far as I understood what you said, this would potentially result and explain a uneven split, is that so?

@github-actions
Copy link

Stale issue message

@mmarinaki
Copy link

Hi @victor-mariano-leite, @zhouzhuojie I am very interested in this issue, because I am trouble shooting something similar right now. Did you end up finding that there were correlations between treatment assignment using this hash function that would bias your experiments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants