Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc error: code = Unknown desc = COMMIT: pq: could not serialize access due to read/write dependencies among transactions #338

Closed
invidian opened this issue Oct 10, 2020 · 11 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@invidian
Copy link
Contributor

Expected Behaviour

Tink server should be capable of handling requests in paralell.

Current Behaviour

Currently, when one applies multiple requests to tink server in parallel, server sometimes returns the following error:

Error: removing workflow "08b64643-010d-4ade-a43f-6ffa7f0afce8": rpc error: code = Unknown desc = COMMIT: pq: could not serialize access due to read/write dependencies among transactions

Steps to Reproduce (for bugs)

Currently, this is sometimes reproducible when using https://github.com/kinvolk/terraform-provider-tinkerbell and when running terraform destroy on large number of Tinkerbell resources.

Context

Your Environment

Tested using Tinkerbell sandbox with tinkerbell server from image quay.io/tinkerbell/tink:sha-adb49da.

invidian added a commit to tinkerbell/terraform-provider-tinkerbell that referenced this issue Oct 12, 2020
This commit makes sure that testing configuration generates unique
resource names, so multiple outputs can be combined and executed by
Terraform in parallel.

As a preparation to test mitigations against
tinkerbell/tink#338.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian added a commit to tinkerbell/terraform-provider-tinkerbell that referenced this issue Oct 12, 2020
This commit makes sure that testing configuration generates unique
resource names, so multiple outputs can be combined and executed by
Terraform in parallel.

As a preparation to test mitigations against
tinkerbell/tink#338.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
invidian added a commit to tinkerbell/terraform-provider-tinkerbell that referenced this issue Oct 12, 2020
To mitigate tinkerbell/tink#338, as most
likely retrying should be happening on server side, not client size.

It seems 'database/sql' used by Tink does not export serialization error
constant, so we use string matching to check for errors.

Checking is strict and should only match only the actual error we want
to retry on.

Currently, retrying has no limit, as depending on the number of parallel
clients, error still may occur. This may potentially lead to the
spinlock if deploying with many clients in parallel talking to single Tink
server, however, this is not considered for the time being.

No retry limit will also ensure, that single client will be able to
apply all operations (basically drain the operations "queue"), no matter
how many operations are done in parallel.

Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@invidian
Copy link
Contributor Author

BTW, in some cases, like

tink/db/template.go

Lines 17 to 45 in 3427566

tx, err := d.instance.BeginTx(ctx, &sql.TxOptions{Isolation: sql.LevelSerializable})
if err != nil {
return errors.Wrap(err, "BEGIN transaction")
}
_, err = wflow.Parse([]byte(data))
if err != nil {
return err
}
_, err = tx.Exec(`
INSERT INTO
template (created_at, updated_at, name, data, id)
VALUES
($1, $1, $2, $3, $4)
ON CONFLICT (id)
DO
UPDATE SET
(updated_at, deleted_at, name, data) = ($1, NULL, $2, $3);
`, time.Now(), name, data, id)
if err != nil {
return errors.Wrap(err, "INSERT")
}
err = tx.Commit()
if err != nil {
return errors.Wrap(err, "COMMIT")
}
return nil
, transaction is not even needed, as there is only a single query being made, which is already ensured to be atomic by the database itself, according to https://stackoverflow.com/a/1171807/2974814, unless I'm missing something.

@mmlb
Copy link
Contributor

mmlb commented Oct 13, 2020

@kdeng3849 looks like what you're looking at right?

@invidian
Copy link
Contributor Author

@parauliya this is mitigation I talked about: tinkerbell/terraform-provider-tinkerbell@d4e4080.

@kqdeng
Copy link
Contributor

kqdeng commented Oct 13, 2020

@mmlb yeah, that's exactly it

@thebsdbox
Copy link
Contributor

@kdeng3849 what is the status of this?

@thebsdbox thebsdbox added the kind/bug Categorizes issue or PR as related to a bug. label Dec 10, 2020
@kqdeng
Copy link
Contributor

kqdeng commented Dec 11, 2020

It has been fixed with this PR: #346

@kqdeng kqdeng closed this as completed Dec 11, 2020
@invidian
Copy link
Contributor Author

Ah, BTW, this was also occurring for other resources, not only for workflows. I'll test if that's still the case.

@invidian
Copy link
Contributor Author

This still occurs when workflows are removed:

       Error: removing workflow "b06356a8-3ed4-11eb-a31e-0242ac1b0004": rpc error: code = Unknown desc = Delete Workflow Error: pq: could not serialize access due to read/write dependencies among transactions



        Error: removing workflow "b05ef9be-3ed4-11eb-a31e-0242ac1b0004": rpc error: code = Unknown desc = COMMIT: pq: could not serialize access due to read/write dependencies among transactions



        Error: removing workflow "b046f1ae-3ed4-11eb-a31e-0242ac1b0004": rpc error: code = Unknown desc = COMMIT: pq: could not serialize access due to read/write dependencies among transactions

Running Tinkerbell 266637a. Can we please re-open?

@invidian
Copy link
Contributor Author

Creating/deleting Hardware entries and Templates seems OK.

@gianarb
Copy link
Contributor

gianarb commented Dec 21, 2020

@invidian asked me to re-open, so I think it is not a fixed issue yet

@tstromberg tstromberg added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Aug 27, 2021
@jacobweinstock jacobweinstock added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 16, 2021
@jacobweinstock
Copy link
Member

close by #654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants