-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long boot hang after "ipmi device interface" #6
Comments
On Tue, 2014-11-25 at 08:32 -0800, Andrew Geissler wrote:
Are interrupts working ? Do you have up to date AMI firmware ?
|
Hostboot doesn't use the interrupts currently so I'm not sure. Here's my version info. Firmware Revision: 1.10.65792 Ani is telling me this isn't the latest, I take it I should try to update? |
On Tue, 2014-11-25 at 13:40 -0800, Andrew Geissler wrote:
I think the 24/11 build has the interrupts working. You can also try devmem 0x1e780080 32 Read the resulting value, set bit 0x8000 (basically turn a 5 into a d) devmem 0x1e780080 32 And see if that helps. Cheers, |
This is likely a duplicate of issue #5 - make sure you're using up-to-date skiboot and linux repositories, and that you're building the correct versions:
|
Yeah, for whatever reason an update from op-build was not getting me the latest skiboot. I went into the sub buildroot directory and did the git update and verified that image worked fine (7185393). The *-dirclean commands above did not work for me either from op-build so I must be missing something there. |
…nt', which requires 8 byte alignment UBSan caught this: hdata/test/../iohub.c:83:2: runtime error: load of misaligned address 0x7f1dc7b0210a for type 'long unsigned int', which requires 8 byte alignment 0x7f1dc7b0210a: note: pointer points here 31 4c 58 08 31 00 04 01 00 30 00 42 50 46 02 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ #0 0x41470a in io_get_lx_info hdata/test/../iohub.c:83 open-power#1 0x41759f in io_add_p8_cec_vpd hdata/test/../iohub.c:450 open-power#2 0x417d35 in io_parse_fru hdata/test/../iohub.c:538 open-power#3 0x41812a in io_parse hdata/test/../iohub.c:600 open-power#4 0x425aa2 in parse_hdat hdata/test/../spira.c:1337 open-power#5 0x43d9f8 in main hdata/test/hdata_to_dt.c:358 open-power#6 0x7f1dcb868509 in __libc_start_main (/lib64/libc.so.6+0x20509) open-power#7 0x4019e9 in _start (/home/stewart/skiboot/hdata/test/hdata_to_dt+0x4019e9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
…nt', which requires 8 byte alignment UBSan caught this: hdata/test/../iohub.c:83:2: runtime error: load of misaligned address 0x7f1dc7b0210a for type 'long unsigned int', which requires 8 byte alignment 0x7f1dc7b0210a: note: pointer points here 31 4c 58 08 31 00 04 01 00 30 00 42 50 46 02 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ #0 0x41470a in io_get_lx_info hdata/test/../iohub.c:83 open-power#1 0x41759f in io_add_p8_cec_vpd hdata/test/../iohub.c:450 open-power#2 0x417d35 in io_parse_fru hdata/test/../iohub.c:538 open-power#3 0x41812a in io_parse hdata/test/../iohub.c:600 open-power#4 0x425aa2 in parse_hdat hdata/test/../spira.c:1337 open-power#5 0x43d9f8 in main hdata/test/hdata_to_dt.c:358 open-power#6 0x7f1dcb868509 in __libc_start_main (/lib64/libc.so.6+0x20509) open-power#7 0x4019e9 in _start (/home/stewart/skiboot/hdata/test/hdata_to_dt+0x4019e9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
…nt', which requires 8 byte alignment UBSan caught this: hdata/test/../iohub.c:83:2: runtime error: load of misaligned address 0x7f1dc7b0210a for type 'long unsigned int', which requires 8 byte alignment 0x7f1dc7b0210a: note: pointer points here 31 4c 58 08 31 00 04 01 00 30 00 42 50 46 02 00 00 78 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ #0 0x41470a in io_get_lx_info hdata/test/../iohub.c:83 open-power#1 0x41759f in io_add_p8_cec_vpd hdata/test/../iohub.c:450 open-power#2 0x417d35 in io_parse_fru hdata/test/../iohub.c:538 open-power#3 0x41812a in io_parse hdata/test/../iohub.c:600 open-power#4 0x425aa2 in parse_hdat hdata/test/../spira.c:1337 open-power#5 0x43d9f8 in main hdata/test/hdata_to_dt.c:358 open-power#6 0x7f1dcb868509 in __libc_start_main (/lib64/libc.so.6+0x20509) open-power#7 0x4019e9 in _start (/home/stewart/skiboot/hdata/test/hdata_to_dt+0x4019e9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Patch 5690c5a("phb4: Reallocate PEC2 DMA-Read engines to improve GPU-Direct bandwidth") introduced allocation of extra DMA-read engines for improving Mellanox CX5 GPU-Direct bandwidth. At present CX5 is the only card thats using these optimizations so these changes will only impact Witherspoon systems. However hardware team has raised the possibility of other non-witherspoon systems in future that may be using a similar card, where these optimizations wont be needed. So they have asked us to make these changes Witherspoon specific. Hence this patch updates the phb4_init_capp_regs() & enable_capi_mode() to configure the extra DMA-read engine allocation if and only if skiboot is running on Witherspoon platform. Cc: stable open-power#6.0.6+ Fixes: 5690c5a("phb4: Reallocate PEC2 DMA-Read engines to improve GPU-Direct bandwidth") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
The current capp recovery timeout control loop in do_capp_recovery_scoms() uses a wrong comparison for return value of tb_compare(). This may cause do_capp_recovery_scoms() to report an timeout earlier than the 168ms stipulated time. The patch fixes this by updating the loop timeout control branch in do_capp_recovery_scoms() to use the correct enum tb_cmpval. Cc: Stable open-power#6.0+ Fixes: 09b853c("capi: Poll Err/Status register during CAPP recovery") Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
The current capp recovery timeout control loop in do_capp_recovery_scoms() uses a wrong comparison for return value of tb_compare(). This may cause do_capp_recovery_scoms() to report an timeout earlier than the 168ms stipulated time. The patch fixes this by updating the loop timeout control branch in do_capp_recovery_scoms() to use the correct enum tb_cmpval. Cc: Stable open-power#6.0+ Fixes: 09b853c("capi: Poll Err/Status register during CAPP recovery") Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
The current capp recovery timeout control loop in do_capp_recovery_scoms() uses a wrong comparison for return value of tb_compare(). This may cause do_capp_recovery_scoms() to report an timeout earlier than the 168ms stipulated time. The patch fixes this by updating the loop timeout control branch in do_capp_recovery_scoms() to use the correct enum tb_cmpval. Cc: Stable open-power#6.0+ Fixes: 09b853c("capi: Poll Err/Status register during CAPP recovery") Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit ec954f7) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Rearrange the internal implementation of strdup and its inclusion in run-mem_region_reservations to avoid invoking the internal implementation in the wrong contexts: ==9810==ERROR: AddressSanitizer: heap-use-after-free on address 0x629000000218 at pc 0x00000052eb1a bp 0x7ffc31aebe70 sp 0x7ffc31aebe68 READ of size 8 at 0x629000000218 thread T0 #0 0x52eb19 in list_check_node /home/andrew/src/open-power/skiboot/core/test/../../ccan/list/list.c:28:10 open-power#1 0x52eb88 in list_check /home/andrew/src/open-power/skiboot/core/test/../../ccan/list/list.c:40:7 open-power#2 0x4f9a74 in __mem_alloc /home/andrew/src/open-power/skiboot/core/test/../mem_region.c:427:2 open-power#3 0x4f8c14 in mem_alloc /home/andrew/src/open-power/skiboot/core/test/../mem_region.c:488:6 open-power#4 0x5138a0 in __memalign /home/andrew/src/open-power/skiboot/core/test/../malloc.c:21:6 open-power#5 0x50d2cb in __malloc /home/andrew/src/open-power/skiboot/core/test/../malloc.c:29:9 open-power#6 0x513f4d in strdup /home/andrew/src/open-power/skiboot/core/test/../../libc/string/strdup.c:23:8 open-power#7 0x52ee1a in llvm_gcda_start_file (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x52ee1a) open-power#8 0x529422 in __llvm_gcov_writeout (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x529422) open-power#9 0x530419 in llvm_writeout_files (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x530419) open-power#10 0x7f191cf2e2ab in __run_exit_handlers /build/glibc-KRRWSm/glibc-2.29/stdlib/exit.c:108:8 open-power#11 0x7f191cf2e3d9 in exit /build/glibc-KRRWSm/glibc-2.29/stdlib/exit.c:139:3 open-power#12 0x7f191cf0db71 in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:342:3 open-power#13 0x41b3d9 in _start (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x41b3d9) 0x629000000218 is located 24 bytes inside of 16384-byte region [0x629000000200,0x629000004200) freed by thread T0 here: #0 0x4c6a52 in __interceptor_free /build/llvm-toolchain-8-F3l7P1/llvm-toolchain-8-8/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3 open-power#1 0x526186 in real_free /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:22:9 open-power#2 0x52380c in main /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:225:2 open-power#3 0x7f191cf0db6a in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:308:16 previously allocated by thread T0 here: #0 0x4c6dd3 in malloc /build/llvm-toolchain-8-F3l7P1/llvm-toolchain-8-8/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3 open-power#1 0x523896 in real_malloc /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:17:9 open-power#2 0x523321 in main /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:185:29 open-power#3 0x7f191cf0db6a in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:308:16 Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Rearrange the internal implementation of strdup and its inclusion in run-mem_region_reservations to avoid invoking the internal implementation in the wrong contexts: ==9810==ERROR: AddressSanitizer: heap-use-after-free on address 0x629000000218 at pc 0x00000052eb1a bp 0x7ffc31aebe70 sp 0x7ffc31aebe68 READ of size 8 at 0x629000000218 thread T0 #0 0x52eb19 in list_check_node /home/andrew/src/open-power/skiboot/core/test/../../ccan/list/list.c:28:10 open-power#1 0x52eb88 in list_check /home/andrew/src/open-power/skiboot/core/test/../../ccan/list/list.c:40:7 open-power#2 0x4f9a74 in __mem_alloc /home/andrew/src/open-power/skiboot/core/test/../mem_region.c:427:2 open-power#3 0x4f8c14 in mem_alloc /home/andrew/src/open-power/skiboot/core/test/../mem_region.c:488:6 open-power#4 0x5138a0 in __memalign /home/andrew/src/open-power/skiboot/core/test/../malloc.c:21:6 open-power#5 0x50d2cb in __malloc /home/andrew/src/open-power/skiboot/core/test/../malloc.c:29:9 open-power#6 0x513f4d in strdup /home/andrew/src/open-power/skiboot/core/test/../../libc/string/strdup.c:23:8 open-power#7 0x52ee1a in llvm_gcda_start_file (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x52ee1a) open-power#8 0x529422 in __llvm_gcov_writeout (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x529422) open-power#9 0x530419 in llvm_writeout_files (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x530419) open-power#10 0x7f191cf2e2ab in __run_exit_handlers /build/glibc-KRRWSm/glibc-2.29/stdlib/exit.c:108:8 open-power#11 0x7f191cf2e3d9 in exit /build/glibc-KRRWSm/glibc-2.29/stdlib/exit.c:139:3 open-power#12 0x7f191cf0db71 in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:342:3 open-power#13 0x41b3d9 in _start (/home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations-gcov+0x41b3d9) 0x629000000218 is located 24 bytes inside of 16384-byte region [0x629000000200,0x629000004200) freed by thread T0 here: #0 0x4c6a52 in __interceptor_free /build/llvm-toolchain-8-F3l7P1/llvm-toolchain-8-8/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3 open-power#1 0x526186 in real_free /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:22:9 open-power#2 0x52380c in main /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:225:2 open-power#3 0x7f191cf0db6a in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:308:16 previously allocated by thread T0 here: #0 0x4c6dd3 in malloc /build/llvm-toolchain-8-F3l7P1/llvm-toolchain-8-8/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3 open-power#1 0x523896 in real_malloc /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:17:9 open-power#2 0x523321 in main /home/andrew/src/open-power/skiboot/core/test/run-mem_region_reservations.c:185:29 open-power#3 0x7f191cf0db6a in __libc_start_main /build/glibc-KRRWSm/glibc-2.29/csu/../csu/libc-start.c:308:16 Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Here is a proposal to collect OPAL call statistics, counts and duration, and track areas we could possibly improve. With a small Linux driver to dump the stats in debugfs, here is what we get on a P9 after boot: OPAL_CONSOLE_WRITE : #22318 0/0/47 OPAL_RTC_READ : open-power#9 0/4/15 OPAL_READ_NVRAM : #3468 0/0/6 OPAL_HANDLE_INTERRUPT : #4724 0/57/10026 OPAL_POLL_EVENTS : #508 2/141/10033 OPAL_PCI_CONFIG_READ_BYTE : #3623 0/0/4 OPAL_PCI_CONFIG_READ_HALF_WORD : #5579 0/0/8 OPAL_PCI_CONFIG_READ_WORD : #6156 0/0/7 OPAL_PCI_CONFIG_WRITE_BYTE : open-power#2 0/0/0 OPAL_PCI_CONFIG_WRITE_HALF_WORD : #1282 0/0/1 OPAL_PCI_CONFIG_WRITE_WORD : #1335 0/0/1 OPAL_PCI_EEH_FREEZE_STATUS : #11123 0/0/2 OPAL_CONSOLE_WRITE_BUFFER_SPACE : #139088 0/0/11 OPAL_PCI_EEH_FREEZE_CLEAR : open-power#148 1/2/8 OPAL_PCI_PHB_MMIO_ENABLE : open-power#22 0/0/0 OPAL_PCI_SET_PHB_MEM_WINDOW : open-power#22 0/0/1 OPAL_PCI_MAP_PE_MMIO_WINDOW : open-power#56 0/0/0 OPAL_PCI_SET_PE : open-power#44 279/284/293 OPAL_PCI_SET_PELTV : open-power#66 0/0/0 OPAL_PCI_SET_XIVE_PE : #1123 0/0/1 OPAL_GET_MSI_64 : #1120 0/0/0 OPAL_START_CPU : open-power#238 8/21/35 OPAL_QUERY_CPU_STATUS : #357 0/11/69 OPAL_PCI_MAP_PE_DMA_WINDOW : open-power#16 0/0/1 OPAL_PCI_MAP_PE_DMA_WINDOW_REAL : open-power#16 0/0/1 OPAL_PCI_RESET : open-power#35 0/471/851 OPAL62 : open-power#6 0/10/46 OPAL_XSCOM_READ : open-power#26 0/0/2 OPAL_XSCOM_WRITE : open-power#8 0/0/1 OPAL_REINIT_CPUS : open-power#4 348/8247/11061 OPAL_CHECK_TOKEN : #134112 0/0/0 OPAL_GET_MSG : open-power#30 0/0/1 OPAL87 : open-power#1 0/0/0 OPAL_PCI_SET_PHB_CAPI_MODE : open-power#2 0/60/121 OPAL_SLW_SET_REG : #1080 3/3/13 OPAL_IPMI_SEND : open-power#53 0/5/11 OPAL_IPMI_RECV : open-power#53 0/0/2 OPAL_I2C_REQUEST : open-power#20 6/10/19 OPAL_FLASH_READ : open-power#10 19/10452/58305 OPAL_PRD_MSG : open-power#1 0/3/3 OPAL_CONSOLE_FLUSH : #134079 0/0/12 OPAL_PCI_GET_PRESENCE_STATE : open-power#7 1/1/3 OPAL_PCI_GET_POWER_STATE : open-power#9 0/0/0 OPAL_PCI_TCE_KILL : open-power#20 1/8/133 OPAL_NMMU_SET_PTCR : open-power#3 253/255/257 OPAL_XIVE_RESET : open-power#3 0/114709/115403 OPAL_XIVE_GET_IRQ_INFO : #1427 0/0/6 OPAL_XIVE_SET_IRQ_CONFIG : #1113 0/125/2810 OPAL_XIVE_GET_QUEUE_INFO : open-power#240 0/0/2 OPAL_XIVE_SET_QUEUE_INFO : #360 0/60/1216 OPAL_XIVE_ALLOCATE_VP_BLOCK : open-power#2 0/59/60 OPAL_XIVE_GET_VP_INFO : open-power#240 0/0/0 OPAL_XIVE_SET_VP_INFO : #360 0/298/3080 OPAL_XIVE_ALLOCATE_IRQ : open-power#240 0/0/3 OPAL140 : open-power#119 0/253/1109 OPAL_IMC_COUNTERS_INIT : open-power#60 9/10/20 OPAL_IMC_COUNTERS_STOP : open-power#36 0/0/2 OPAL_PCI_GET_PBCQ_TUNNEL_BAR : open-power#1 2/2/2 OPAL_PCI_SET_PBCQ_TUNNEL_BAR : open-power#1 1/1/1 OPAL_NX_COPROC_INIT : open-power#2 3/4/5 Signed-off-by: Cédric Le Goater <clg@kaod.org>
Booting the latest code from github today on my Palmetto and noticed a very lengthy hang during the boot. Is this expected here?
[ 2.906206] ipmi message handler version 39.2
[ 2.907161] ipmi device interface
[ 241.304835] I[127423325658,3] BT: Expiring old messsage number 0x02
[127423385289,3] IPMI: Incorrect netfn 0x19 in response
NFO: task swapper/16:1 blocked for more than 120 seconds.
[ 241.305218] IPMI message handler: BMC returned incorrect response, expected netfn 7 cmd 8, got netfn 6 cmd 8
[ 241.305267] ipmi-powernv ibm,opal:ipmi: Found new BMC (man_id: 0x000000, prod_id: 0x0000, dev_id: 0x00)
[ 241.314995] random: nonblocking pool is initialized
The text was updated successfully, but these errors were encountered: