Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Segfault related to corrupted memory in Go runtime (1.22.6) #1039

Open
ryanluu12345 opened this issue Oct 7, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@ryanluu12345
Copy link
Contributor

Environment

  • First line of replicator version (specifically the git SHA):
  • Source DB and version: Cockroach
  • Target DB and version: Oracle
  • Staging DB and version, if separate from TargetDB: Cockroach
  • Load-balancer setup, if applicable: unknown

Describe the bug
Customer ran into this segmentation violation:
SIGSEGV: segmentation violation
PC=0x428c17 m=71 sigcode=1 addr=0x20

By looking at m=71, you can see this is the issue that caused it:

goroutine 0 gp=0xc017aea1c0 m=71 mp=0xc017ae2008 [idle]:
runtime.(*mspan).typePointersOfUnchecked(0xc0042e0000?, 0xc008f31020?)
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/mbitmap_allocheaders.go:202 +0x37 fp=0x7f28237fdd28 sp=0x7f28237fdd08 pc=0x428c17
runtime.scanobject(0xc00005d268?, 0xc00005d268)
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/mgcmark.go:1446 +0xb5 fp=0x7f28237fddb8 sp=0x7f28237fdd28 pc=0x434af5
runtime.gcDrain(0xc00005d268, 0x2)
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/mgcmark.go:1242 +0x1f4 fp=0x7f28237fde20 sp=0x7f28237fddb8 pc=0x434454
runtime.gcDrainMarkWorkerDedicated(...)
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/mgcmark.go:1124
runtime.gcBgMarkWorker.func2()
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/mgc.go:1402 +0x155 fp=0x7f28237fde70 sp=0x7f28237fde20 pc=0x430b75
runtime.systemstack(0x800000)
        /opt/hostedtoolcache/go/1.22.6/x64/src/runtime/asm_amd64.s:509 +0x4a fp=0x7f28237fde80 sp=0x7f28237fde70 pc=0x4861aa

@bobvawter pointed out that this looks related to this ticket opened in the Go repository for runtime: golang/go#68632

Given that the ticket is still unresolved, we need to stay posted and patch our version of Go once a fix is out (pending repro and diagnoses on the Go side)

@ryanluu12345 ryanluu12345 added the bug Something isn't working label Oct 7, 2024
@ryanluu12345
Copy link
Contributor Author

Notes for myself:

@bobvawter
Copy link
Contributor

Negative results on enabling GOEXPERIMENT=cgocheck2 on the Oracle CI builds for either master or the v1.0 branch for N=2. The next thing to try is a soak test using the workload subcommand. This assumes that it’s A) actually a cgo-related memory corruption issue as opposed to some other runtime bug, B) that the cgo checks could catch whatever bad behavior might exist, and C) that it’s reasonably likely to happen again.

@ryanluu12345
Copy link
Contributor Author

ryanluu12345 commented Oct 10, 2024

Thanks for running those tests @bobvawter . We've discussed offline in a thread, but a helpful next step here is to add the extra CGO checks, like above, and run soak tests for the C to Oracle case.

I'm currently looking into the GH issues on Go and see a couple of other cases where users have run into this issue:
golang/go#69247
golang/go#69247 (comment)

Key notes here:

  • Running on Go 1.22* versions
  • Packages that also run with CGO and unsafe

Filter for finding similar issues

Edit: original issue closed, tracking this now: golang/go#68632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants