Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluator does not keep distributions order, which makes percentage rollout inconsistent. #229

Closed
badboyd opened this issue Feb 15, 2020 · 6 comments · Fixed by #233
Closed
Assignees
Labels

Comments

@badboyd
Copy link
Contributor

badboyd commented Feb 15, 2020

I think this query should be ordered by created_at evaluations same as rules

@badboyd badboyd closed this as completed Feb 15, 2020
@badboyd badboyd reopened this Feb 15, 2020
@badboyd badboyd changed the title Evaluator does not keep distributions order, which makes percentage rollout inconsistent Evaluator does not keep distributions order, which makes percentage rollout inconsistent. Feb 15, 2020
@markphelps
Copy link
Collaborator

Thanks @badboyd. Could you please provide a test case or steps to reproduce this bug?

@badboyd
Copy link
Contributor Author

badboyd commented Feb 16, 2020

Request payload:

{ "flagKey": "test", "entityId": "10000", "context": { "foo": "bar" } }

Steps to reproduce:

1/ Create a rule has 2 variants var_1, var_2 with distributions 80-20. Run debug with above payload.

2/ Click the edit distributions button then click save(you maybe have to do it several time), run the debug again and see the response variant is different.

I also try with the query in Postgres. If I change the distributions, then do the distributions query again var_1 and var_2 can change position in the response array.

I think because the query returns an unordered list of variants, that's why this happens.

@markphelps
Copy link
Collaborator

Thanks! Will look into it tomorrow

@markphelps
Copy link
Collaborator

@badboyd I tried to reproduce this locally but couldnt. I tried with both cache enabled and disabled and using SQLite

I pushed a test case that should reproduce the error, but couldnt get it to fail: https://github.com/markphelps/flipt/compare/issue229

Couple of questions:

  1. what version of Flipt are you running? Output of flipt --version?
  2. You say you got this error using both SQLite and Postgres?
  3. Do you have in-memory cache enabled?
  4. Would it be possible for you to provide a screen recording or maybe the debug logs when you experience this error?

Thanks!

@badboyd
Copy link
Contributor Author

badboyd commented Feb 18, 2020

Hi @markphelps,

  1. I use the master branch
  2. I got this problem using Postgres 9.4
  3. I try with cache and without it
  4. I have logs

`time="2020-02-18T04:52:22Z" level=debug msg=evaluate request="flag_key:"test" entity_id:"1000" context:<key:"foo" value:"bar" > " server=grpc storage=postgres

time="2020-02-18T04:52:22Z" level=debug msg="Distribution: %+v{2f6b679e-4435-426b-9f1d-0e400671d220 bf70026c-263e-4094-a588-813e6ef6f3d2 c01c285d-0ceb-4e2a-bbad-f642a812b6d4 20 var_1}" server=grpc storage=postgres

time="2020-02-18T04:52:22Z" level=debug msg="Distribution: %+v{e3585ee9-ebe0-4141-ae1e-7de294b710e9 bf70026c-263e-4094-a588-813e6ef6f3d2 9bb1f503-a46d-4037-8326-8063e742ef3e 80 var_2}" server=grpc storage=postgres

time="2020-02-18T04:52:22Z" level=debug msg="matched distribution: &{ID:2f6b679e-4435-426b-9f1d-0e400671d220 RuleID:bf70026c-263e-4094-a588-813e6ef6f3d2 VariantID:c01c285d-0ceb-4e2a-bbad-f642a812b6d4 Rollout:20 VariantKey:var_1}" server=grpc storage=postgres

time="2020-02-18T04:52:22Z" level=debug msg=evaluate response="request_id:"5f5abeef-500e-4c25-8036-29d6c190cc8e" entity_id:"1000" request_context:<key:"foo" value:"bar" > match:true flag_key:"test" segment_key:"all-users" timestamp:<seconds:1582001542 nanos:44586300 > value:"var_1" request_duration_millis:34.9126 " server=grpc storage=postgres

time="2020-02-18T04:52:22Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Evaluate grpc.service=flipt.Flipt grpc.start_time="2020-02-18T04:52:22Z" grpc.time_ms=36.123 peer.address="127.0.0.1:46986" server=grpc span.kind=server storage=postgres system=grpc

time="2020-02-18T04:52:30Z" level=debug msg="update distribution" request="id:"e3585ee9-ebe0-4141-ae1e-7de294b710e9" flag_key:"test" rule_id:"bf70026c-263e-4094-a588-813e6ef6f3d2" variant_id:"9bb1f503-a46d-4037-8326-8063e742ef3e" rollout:80 " server=grpc storage=postgres

time="2020-02-18T04:52:30Z" level=debug msg="update distribution" request="id:"2f6b679e-4435-426b-9f1d-0e400671d220" flag_key:"test" rule_id:"bf70026c-263e-4094-a588-813e6ef6f3d2" variant_id:"c01c285d-0ceb-4e2a-bbad-f642a812b6d4" rollout:20 " server=grpc storage=postgres

time="2020-02-18T04:52:30Z" level=debug msg="update distribution" response="id:"e3585ee9-ebe0-4141-ae1e-7de294b710e9" rule_id:"bf70026c-263e-4094-a588-813e6ef6f3d2" variant_id:"9bb1f503-a46d-4037-8326-8063e742ef3e" rollout:80 created_at:<seconds:1582001470 nanos:836021000 > updated_at:<seconds:1582001550 nanos:35578000 > " server=grpc storage=postgres

time="2020-02-18T04:52:30Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=UpdateDistribution grpc.service=flipt.Flipt grpc.start_time="2020-02-18T04:52:30Z" grpc.time_ms=83.373 peer.address="127.0.0.1:46986" server=grpc span.kind=server storage=postgres system=grpc

time="2020-02-18T04:52:30Z" level=debug msg="update distribution" response="id:"2f6b679e-4435-426b-9f1d-0e400671d220" rule_id:"bf70026c-263e-4094-a588-813e6ef6f3d2" variant_id:"c01c285d-0ceb-4e2a-bbad-f642a812b6d4" rollout:20 created_at:<seconds:1582001470 nanos:821454000 > updated_at:<seconds:1582001550 nanos:54152000 > " server=grpc storage=postgres

time="2020-02-18T04:52:30Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=UpdateDistribution grpc.service=flipt.Flipt grpc.start_time="2020-02-18T04:52:30Z" grpc.time_ms=74.678 peer.address="127.0.0.1:46986" server=grpc span.kind=server storage=postgres system=grpc

time="2020-02-18T04:52:35Z" level=debug msg=evaluate request="flag_key:"test" entity_id:"1000" context:<key:"foo" value:"bar" > " server=grpc storage=postgres

time="2020-02-18T04:52:35Z" level=debug msg="Distribution: %+v{e3585ee9-ebe0-4141-ae1e-7de294b710e9 bf70026c-263e-4094-a588-813e6ef6f3d2 9bb1f503-a46d-4037-8326-8063e742ef3e 80 var_2}" server=grpc storage=postgres

time="2020-02-18T04:52:35Z" level=debug msg="Distribution: %+v{2f6b679e-4435-426b-9f1d-0e400671d220 bf70026c-263e-4094-a588-813e6ef6f3d2 c01c285d-0ceb-4e2a-bbad-f642a812b6d4 20 var_1}" server=grpc storage=postgres

time="2020-02-18T04:52:35Z" level=debug msg="matched distribution: &{ID:e3585ee9-ebe0-4141-ae1e-7de294b710e9 RuleID:bf70026c-263e-4094-a588-813e6ef6f3d2 VariantID:9bb1f503-a46d-4037-8326-8063e742ef3e Rollout:80 VariantKey:var_2}" server=grpc storage=postgres`

I did try with SQLite and could not reproduce this issue. Look like SQLite always returns the distributions in same order.

@markphelps
Copy link
Collaborator

Ah looks like you are right @badboyd ! The tests I added did fail but only on Postgres! https://github.com/markphelps/flipt/commit/9e1538112b2e323a4995f819ae843fb2a2db89dd/checks?check_suite_id=463661422#step:8:707

Will make a fix now and create a new release. Thank you again for finding and reporting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants