Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retries don't get applied #100

Closed
vponomaryov opened this issue Aug 7, 2024 · 1 comment · Fixed by #104
Closed

Retries don't get applied #100

vponomaryov opened this issue Aug 7, 2024 · 1 comment · Fixed by #104

Comments

@vponomaryov
Copy link
Contributor

Using latest latte version it stops execution on the first failed query having retries be configured:

...
         Threads                    1                                                                 
     Connections                    1                                                                 
     Concurrency     [req]        128                                                                 
        Max rate    [op/s]                                                                            
          Warmup       [s]                                                                            
              └─      [op]          1                                                                 
        Run time       [s]       20.0                                                                 
              └─      [op]                                                                            
        Sampling       [s]        1.0                                                                 
              └─      [op]                                                                            
 Request timeout       [s]          5                                                                 
         Retries                   10                                                                 
    ├─ min delay      [ms]        100                                                                 
    └─ max delay      [ms]       5000                                                                 

LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Time    Cycles    Errors    Thrpt.     ────────────────────────────────── Latency [ms/op] ──────────────────────────────
     [s]      [op]      [op]    [op/s]             Min        25        50        75        90        99      99.9       Max
   1.001     16169         0     16161             2.5       5.1       6.7       9.6      12.8      16.4      19.8      21.5
   2.002     15580         0     15565             2.0       5.4       7.2       9.8      12.6      16.5      17.8      19.1
   3.001     15908         0     15922             2.1       5.2       6.9       9.8      12.4      16.6      18.7      19.1
   4.000     14899         0     14903             2.2       5.4       7.5      10.5      13.5      19.9      26.7      28.6
   5.001      4812         0      4807             2.3       7.8      13.7      35.1      68.6     114.2     168.3     169.3
   6.002      2958         0      2956             2.7      14.6      29.6      67.2      88.2     175.1     177.3     177.5
   7.001      5190         0      5197             2.3       7.0      11.8      29.5      64.1      98.7     122.8     124.8
   7.691      2610         1      3782             2.1       8.7      17.0      60.2      82.2     179.2     186.3     186.4
error: Cassandra error: Failed to execute query "INSERT INTO data.property(name, value) VALUES (:name, :value)  USING TIMESTAMP :client_ts" with params [Text("gdvgvpzlndxvqfpjdocj"), Text("gdvgvpzlndxvqfpjdocjmqkbdjijgu"), BigInt(1723026298)]: Invalid message: Frame error: early eof

In the above example I brought down 1 DB node from 3 having CL=ALL.
Having 10 configured retries, I expect it all to be applied (and be long enough to survive that node come back to UN time).
Instead I got latte stress crash.

Looks like it is direct cause of the following change: 8cbbe2b

@pkolaczk
Copy link
Owner

pkolaczk commented Aug 9, 2024

The problem with the previous behavior was that it kept retrying even on obvious user errors, like invalid query string in the script. And because the default retry count was high, it looked as if it froze.

I think in this case we need to add a parameter for controlling retries to select whether we want to retry only on timeout / overload errors (current behavior) or on all errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants