You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #24543 (comment), assembly code is non-preemptible. Non-preemptible goroutines can delay stop-the-world pauses (after all other goroutines have been paused), meaning that they can directly contribute to workload tail latency if timed poorly. This makes libraries which perform unbounded amounts of work in assembly loops risky for use in latency sensitive applications.
One way to mitigate this risk would be to bound the amount of work that any single call into an assembly routine performs on a case-by-case basis.
The crypto libraries provide a handful of examples where this technique could be applied. For example, in crypto/sha256, the following diff limits the non-preemptible section of a sha256 hash to somewhere around 250µs:
diff--gita/src/crypto/sha256/sha256.gob/src/crypto/sha256/sha256.goindex2deafbc9fc..567a3c81f9100644---a/src/crypto/sha256/sha256.go+++b/src/crypto/sha256/sha256.go
@@ -28,6+28,10 @@ constSize224=28// The blocksize of SHA256 and SHA224 in bytes.constBlockSize=64+// The maximum number of iterations performed by a single assembly loop.+constmaxAsmIters=1024+constmaxAsmSize=maxAsmIters*BlockSize// 64KiB+const (
chunk=64init0=0x6A09E667
@@ -191,6+195,11 @@ func (d*digest) Write(p []byte) (nnint, errerror) {
}
iflen(p) >=chunk {
n:=len(p) &^ (chunk-1)
+forn>maxAsmSize {
+block(d, p[:maxAsmSize])
+p=p[maxAsmSize:]
+n-=maxAsmSize+ }
block(d, p[:n])
p=p[n:]
}
We've seen in CockroachDB that this change combined with an almost identical one in crypto/md5 halves database p99 latency during backup uploads to AWS S3.
Alternative 1
One simple alternative is to document this risk and push the responsibility to bound the amount of non-preemptible to the users of these libraries.
However, that's not a great solution, as the direct callers of these hashing libraries are often buried deep inside of other libraries which applications cannot control (examples in the aws sdk: here, here, and here).
By default, the runtime cannot safely preempt assembly code since it won't know what registers contain pointers. As a follow-on to the work on safe-points everywhere, we should audit assembly in the standard library for non-preemptible loops and annotate them with register maps. In most cases this should be trivial since most assembly never constructs a pointer that isn't shadowed by an argument, so it can simply claim there are no pointers in registers. We should also document in the Go assembly guide how to do this for user code.
To my knowledge, this has never been implemented. If it was, it would provide an alternative to this proposal, though it would still require changes around each relevant use of assembly.
cc. @aclements you've thought about this problem more than anyone else. Does this seem like a reasonable proposal?
The text was updated successfully, but these errors were encountered:
Your code could be simplified but it sounds reasonable.
A third alternative could be to have assembly check some magic preemption boolean and return early if it is preempted.
Something along the lines of:
118605: build: add go runtime patch to bound non-preemptible work r=nvanbenschoten,rickystewart,rsevinsky-cr a=nicktrav
There are two commits in this patch set, and are best reviewed individually.
---
The first is a purely mechanical change to reapply our existing Go runtime patches to the upstream, and then port the diff back into Cockroach. This shuffles around the ordering of the various patches, making subsequent patching simpler.
The second patch is cribbed from golang/go#64417, and is the material change, touching #115192.
Epic: None.
Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Proposal Details
As discussed in #24543 (comment), assembly code is non-preemptible. Non-preemptible goroutines can delay stop-the-world pauses (after all other goroutines have been paused), meaning that they can directly contribute to workload tail latency if timed poorly. This makes libraries which perform unbounded amounts of work in assembly loops risky for use in latency sensitive applications.
One way to mitigate this risk would be to bound the amount of work that any single call into an assembly routine performs on a case-by-case basis.
The
crypto
libraries provide a handful of examples where this technique could be applied. For example, incrypto/sha256
, the following diff limits the non-preemptible section of a sha256 hash to somewhere around 250µs:We've seen in CockroachDB that this change combined with an almost identical one in
crypto/md5
halves database p99 latency during backup uploads to AWS S3.Alternative 1
One simple alternative is to document this risk and push the responsibility to bound the amount of non-preemptible to the users of these libraries.
However, that's not a great solution, as the direct callers of these hashing libraries are often buried deep inside of other libraries which applications cannot control (examples in the aws sdk: here, here, and here).
Alternative 2
The
Safe-points everywhere for non-cooperative goroutine preemption
design proposed a method to annotate safe preemption points in assembly code.To my knowledge, this has never been implemented. If it was, it would provide an alternative to this proposal, though it would still require changes around each relevant use of assembly.
cc. @aclements you've thought about this problem more than anyone else. Does this seem like a reasonable proposal?
The text was updated successfully, but these errors were encountered: