Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSM1: Panic "missed stack barrier" when profiling enabled #4328

Closed
otoolep opened this issue Oct 5, 2015 · 17 comments
Closed

TSM1: Panic "missed stack barrier" when profiling enabled #4328

otoolep opened this issue Oct 5, 2015 · 17 comments

Comments

@otoolep
Copy link
Contributor

otoolep commented Oct 5, 2015

During 200K EPS burn-in testing of TSM1, the follow panic brought testing to a halt.

found next stack barrier at 0xc8201a6f18; expected [*0xc820419f08=0x868f0b *0xc8201a6f80=0x461f01]                                                                                                                                                              
fatal error: missed stack barrier                                                                                                                                                                                                                               

goroutine 0 [idle]:                                                                                                                                                                                                                                             
runtime.throw(0xcb0430, 0x14)                                                                                                                                                                                                                                   
        /home/philip/.gvm/gos/go1.5/src/runtime/panic.go:527 +0x90                                                                                                                                                                                              
runtime.gentraceback(0x41385f, 0x7f503f797b48, 0x0, 0xc8200f8180, 0x0, 0xc820101738, 0x40, 0x0, 0x0, 0x6, ...)                                                                                                                                                  
        /home/philip/.gvm/gos/go1.5/src/runtime/traceback.go:259 +0x1053                                                                                                                                                                                        
runtime.sigprof(0x41385f, 0x7f503f797b48, 0x0, 0xc8200f8180, 0xc8200f6000)                                                                                                                                                                                      
        /home/philip/.gvm/gos/go1.5/src/runtime/proc1.go:2585 +0x50b                                                                                                                                                                                            
runtime.sighandler(0xc80000001b, 0xc820101bb0, 0xc820101a80, 0xc8200f8180)                                                                                                                                                                                      
        /home/philip/.gvm/gos/go1.5/src/runtime/signal_amd64x.go:47 +0xbe                                                                                                                                                                                       
runtime.sigtrampgo(0x1b, 0xc820101bb0, 0xc820101a80)                                                                                                                                                                                                            
        /home/philip/.gvm/gos/go1.5/src/runtime/signal_linux.go:94 +0x95                                                                                                                                                                                        
runtime.sigtramp(0x1, 0x0, 0xc8200fa000, 0x0, 0x7fa0, 0x4000000, 0xc82dd21980, 0x1a74b6, 0x0, 0x15, ...)                                                                                                                                                        
        /home/philip/.gvm/gos/go1.5/src/runtime/sys_linux_amd64.s:234 +0x1b                                                                                                                                                                                     
runtime.sigreturn(0x0, 0xc8200fa000, 0x0, 0x7fa0, 0x4000000, 0xc82dd21980, 0x1a74b6, 0x0, 0x15, 0xd77cc8, ...)                                                                                                                                                  
        /home/philip/.gvm/gos/go1.5/src/runtime/sys_linux_amd64.s:238                                                                                                                                                                                           

goroutine 3091302 [copystack]:                                                                                                                                                                                                                                  
fmt.(*pp).doPrintf(0xc8220212b0, 0xc24ba0, 0x7, 0xc820419630, 0x2, 0x2)                                                                                                                                                                                         
        /home/philip/.gvm/gos/go1.5/src/fmt/print.go:1082 fp=0xc8201a6538 sp=0xc8201a6530                                                                                                                                                                       
fmt.Sprintf(0xc24ba0, 0x7, 0xc820419630, 0x2, 0x2, 0x0, 0x0)                                                                                                                                                                                                    
        /home/philip/.gvm/gos/go1.5/src/fmt/print.go:203 +0x6f fp=0xc8201a6588 sp=0xc8201a6538                                                                                                                                                                  
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).nextFileName(0xc834ce3680, 0x0, 0x0)                                                                                                                                                                    
        /home/philip/repos/influxdb/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1235 +0x1ad fp=0xc8201a6668 sp=0xc8201a6588                                                                                                                       
found next stack barrier at 0xc8201a6f18; expected [*0xc820419f08=0x868f0b *0xc8201a6f80=0x461f01]                                                                                                                                                              
fatal error: missed stack barrier
panic during panic

goroutine 0 [idle]:
runtime.startpanic_m()
        /home/philip/.gvm/gos/go1.5/src/runtime/panic1.go:67 +0x141
runtime.systemstack(0xd7c550)
        /home/philip/.gvm/gos/go1.5/src/runtime/asm_amd64.s:278 +0xab
runtime.startpanic()
        /home/philip/.gvm/gos/go1.5/src/runtime/panic.go:505 +0x14
runtime.throw(0xcb0430, 0x14)
        /home/philip/.gvm/gos/go1.5/src/runtime/panic.go:526 +0x83
runtime.gentraceback(0x477bc0, 0xc8201a6530, 0x0, 0xc82dd21980, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /home/philip/.gvm/gos/go1.5/src/runtime/traceback.go:259 +0x1053
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc82dd21980, 0x0)
        /home/philip/.gvm/gos/go1.5/src/runtime/traceback.go:550 +0xc8
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc82dd21980)
        /home/philip/.gvm/gos/go1.5/src/runtime/traceback.go:527 +0x48
runtime.tracebackothers(0xc8200f8000)
        /home/philip/.gvm/gos/go1.5/src/runtime/traceback.go:664 +0xda
runtime.dopanic_m(0xc8200f8000, 0x42e830, 0xc820101588)
        /home/philip/.gvm/gos/go1.5/src/runtime/panic1.go:104 +0x1f9
runtime.dopanic.func1()
        /home/philip/.gvm/gos/go1.5/src/runtime/panic.go:514 +0x32
runtime.systemstack(0xc820101560)
        /home/philip/.gvm/gos/go1.5/src/runtime/asm_amd64.s:278 +0xab
runtime.dopanic(0x0)
        /home/philip/.gvm/gos/go1.5/src/runtime/panic.go:515 +0x61
runtime.throw(0xcb0430, 0x14)
        /home/philip/.gvm/gos/go1.5/src/runtime/panic.go:527 +0x90
runtime.gentraceback(0x41385f, 0x7f503f797b48, 0x0, 0xc8200f8180, 0x0, 0xc820101738, 0x40, 0x0, 0x0, 0x6, ...)
        /home/philip/.gvm/gos/go1.5/src/runtime/traceback.go:259 +0x1053
runtime.sigprof(0x41385f, 0x7f503f797b48, 0x0, 0xc8200f8180, 0xc8200f6000)
        /home/philip/.gvm/gos/go1.5/src/runtime/proc1.go:2585 +0x50b
runtime.sighandler(0xc80000001b, 0xc820101bb0, 0xc820101a80, 0xc8200f8180)
        /home/philip/.gvm/gos/go1.5/src/runtime/signal_amd64x.go:47 +0xbe
runtime.sigtrampgo(0x1b, 0xc820101bb0, 0xc820101a80)
        /home/philip/.gvm/gos/go1.5/src/runtime/signal_linux.go:94 +0x95
runtime.sigtramp(0x1, 0x0, 0xc8200fa000, 0x0, 0x7fa0, 0x4000000, 0xc82dd21980, 0x1a74b6, 0x0, 0x15, ...)
        /home/philip/.gvm/gos/go1.5/src/runtime/sys_linux_amd64.s:234 +0x1b
runtime.sigreturn(0x0, 0xc8200fa000, 0x0, 0x7fa0, 0x4000000, 0xc82dd21980, 0x1a74b6, 0x0, 0x15, 0xd77cc8, ...)
        /home/philip/.gvm/gos/go1.5/src/runtime/sys_linux_amd64.s:238

cc @jwilder @pauldix

@otoolep otoolep added this to the 0.9.5 milestone Oct 5, 2015
@jwilder
Copy link
Contributor

jwilder commented Oct 5, 2015

Possibly related: #4308 (comment)

@dgnorton
Copy link
Contributor

dgnorton commented Oct 5, 2015

This issue in Go 1.5 looks similar: golang/go#12238. A fix went into 1.5 but looks like they weren't able to exactly repro the issue. Maybe there is still a related bug.

@otoolep
Copy link
Contributor Author

otoolep commented Oct 5, 2015

I can bump our build process to 1.5.1 in an instant.

On Mon, Oct 5, 2015 at 12:28 PM, dgnorton notifications@github.com wrote:

This issue in Go 1.5 looks similar: golang/go#12238
golang/go#12238. A fix went into 1.5 but
looks like they weren't able to exactly repro the issue. Maybe there is
still a related bug.


Reply to this email directly or view it on GitHub
#4328 (comment).

@otoolep
Copy link
Contributor Author

otoolep commented Oct 5, 2015

Just say the word.

#4330

On Mon, Oct 5, 2015 at 12:31 PM, Philip O'Toole philip@influxdb.com wrote:

I can bump our build process to 1.5.1 in an instant.

On Mon, Oct 5, 2015 at 12:28 PM, dgnorton notifications@github.com
wrote:

This issue in Go 1.5 looks similar: golang/go#12238
golang/go#12238. A fix went into 1.5 but
looks like they weren't able to exactly repro the issue. Maybe there is
still a related bug.


Reply to this email directly or view it on GitHub
#4328 (comment)
.

@otoolep
Copy link
Contributor Author

otoolep commented Oct 5, 2015

golang/go#12238 (comment)

"As you can see from the stack trace, cgo code has called into Go code, the
Go code is allocating memory, the profiler has decided to jump in and get a
stack trace, and getting the stack trace crashes."

The testing that brought this out was running with profiling enabled.

On Mon, Oct 5, 2015 at 12:32 PM, Philip O'Toole philip@influxdb.com wrote:

Just say the word.

#4330

On Mon, Oct 5, 2015 at 12:31 PM, Philip O'Toole philip@influxdb.com
wrote:

I can bump our build process to 1.5.1 in an instant.

On Mon, Oct 5, 2015 at 12:28 PM, dgnorton notifications@github.com
wrote:

This issue in Go 1.5 looks similar: golang/go#12238
golang/go#12238. A fix went into 1.5 but
looks like they weren't able to exactly repro the issue. Maybe there is
still a related bug.


Reply to this email directly or view it on GitHub
#4328 (comment)
.

@otoolep
Copy link
Contributor Author

otoolep commented Oct 5, 2015

#4330

@otoolep
Copy link
Contributor Author

otoolep commented Oct 9, 2015

@mjdesa hit this a second time, again with a binary built using 1.5.

@desa
Copy link
Contributor

desa commented Oct 13, 2015

@otoolep hit it another time with bc569e8 using go1.5.1

@DanielMorsing
Copy link
Contributor

OK, I haven't seen anyone hit this on the go project bug tracker, so I think it's up to us to report it. @mjdesa, could you send me the logs you have so I can prepare something for the go folks?

@otoolep
Copy link
Contributor Author

otoolep commented Oct 13, 2015

@mjdesa -- your test system is running with profiling enabled, correct?

@desa
Copy link
Contributor

desa commented Oct 14, 2015

correct

@desa
Copy link
Contributor

desa commented Oct 20, 2015

@otoolep I don't think the config option for profiling does anything.

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = false
  log-enabled = false
  write-tracing = false
  pprof-enabled = false

Whole config here

2015/10/20 16:57:21 InfluxDB starting, version 0.9.5-nightly-7870b84, branch master, commit 7870b84f20e4164d8c94d37ae1566dfbd492cda5, built '2015-10-15T04:00:42+0000'
2015/10/20 16:57:21 Go version go1.5.1, GOMAXPROCS set to 32
2015/10/20 16:57:21 Using configuration at: /etc/opt/influxdb/influxdb.conf
2015/10/20 16:57:22 writing CPU profile to: /tmp/cpu.prof
2015/10/20 16:57:22 writing mem profile to: /tmp/mem.prof

@otoolep
Copy link
Contributor Author

otoolep commented Oct 20, 2015

That config option in the file is a different thing, it does not control
the pprof shown in the log.

You'll see profiling has been enabled if you check influxd in the process
table.

On Tuesday, October 20, 2015, Michael Desa notifications@github.com wrote:

@otoolep https://github.com/otoolep I don't think the config option for
profiling does anything.

[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = false
write-tracing = false
pprof-enabled = false

2015/10/20 16:57:21 InfluxDB starting, version 0.9.5-nightly-7870b84, branch master, commit 7870b84, built '2015-10-15T04:00:42+0000'
2015/10/20 16:57:21 Go version go1.5.1, GOMAXPROCS set to 32
2015/10/20 16:57:21 Using configuration at: /etc/opt/influxdb/influxdb.conf
2015/10/20 16:57:22 writing CPU profile to: /tmp/cpu.prof
2015/10/20 16:57:22 writing mem profile to: /tmp/mem.prof


Reply to this email directly or view it on GitHub
#4328 (comment).

@desa
Copy link
Contributor

desa commented Oct 20, 2015

@otoolep how do I shut profiling off?

I believe that I used the ansible playbook that had profiling disabled by default. I could be mistaken though.

@desa
Copy link
Contributor

desa commented Oct 20, 2015

false alarm.

Profiling was still enabled in /etc/default/influxdb.

@otoolep
Copy link
Contributor Author

otoolep commented Oct 21, 2015

All testing so far shows that this is only an issue when profiling is enabled.

@otoolep otoolep changed the title TSM1: Panic "missed stack barrier" TSM1: Panic "missed stack barrier" when profiling enabled Oct 21, 2015
@jwilder jwilder added the panic label Nov 5, 2015
@pauldix
Copy link
Member

pauldix commented Dec 8, 2015

I think this was fixed in 0.9.6

@pauldix pauldix closed this as completed Dec 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants