Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arista 7050QX-32 boots then crashes #1560

Open
sgollnitz opened this issue Dec 15, 2023 · 3 comments
Open

Arista 7050QX-32 boots then crashes #1560

sgollnitz opened this issue Dec 15, 2023 · 3 comments

Comments

@sgollnitz
Copy link

Hello, I have been trying to get sonic installed on my Arista 7050QX-32 for a while now but I keep running into the same issue. I have replaced the onboard flash which is insufficient to run sonic with an external USB as mentioned in https://github.com/hugocollignon/SONiC/blob/main/SONiC-Arista-7050QX-32.md so it is able to install and run. The problem I am seeing is after the first boot and all of the container start eventually it reaches a point where all of the containers crash, it spits out a error and just reboots the system in a loop.

Here is the error it throws before booting

[ 426.971983] #PF: supervisor read access in kernel mode
[ 427.033267] #PF: error_code(0x0000) - not-present page
[ 427.094551] PGD 0 P4D 0
[ 427.124681] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 427.176623] CPU: 0 PID: 9515 Comm: cat Tainted: G OE 6.1.0-11-2-amd64 #1 Debian 6.1.38-4
[ 427.289834] RIP: 0010:bkn_seq_dma_next_pos+0x2f/0x74 [linux_bcm_knet]
[ 427.366711] Code: e8 d3 fd ff ff 48 85 f6 74 5e 8b 4a 0c ff c1 83 7a 04 00 74 36 83 f9 40 7e 36 8b 7a 08 c7 42 0c ff ff ff ff 8d 4f 01 89 4a 08 <3b> 88 d0 02 00 00 7c 31 31 c0 48 89 42 04 8b 02 8d 78 01 89 3a e8
[ 427.591043] RSP: 0018:ffffa1160667bca8 EFLAGS: 00010202
[ 427.653368] RAX: 0000000000000000 RBX: ffff9544d0c9c870 RCX: 0000000000000001
[ 427.738533] RDX: ffff9543dfa822f0 RSI: 0000000000000001 RDI: 0000000000000000
[ 427.823699] RBP: 0000000000000000 R08: ffff9543dfa822f0 R09: ffff9544d0c9c898
[ 427.908863] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa1160667bd48
[ 427.994028] R13: ffffa1160667bd20 R14: ffff9543dfa822f0 R15: 0000000000000000
[ 428.079195] FS: 00007fc5c5852740(0000) GS:ffff9544fbc00000(0000) knlGS:0000000000000000
[ 428.175792] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 428.244346] CR2: 00000000000002d0 CR3: 000000003ebb0000 CR4: 00000000000006f0
[ 428.329513] Call Trace:
[ 428.358601]
[ 428.383535] ? __die_body.cold+0x1a/0x1f
[ 428.430281] ? page_fault_oops+0xd2/0x2b0
[ 428.478068] ? exc_page_fault+0x70/0x170
[ 428.524814] ? asm_exc_page_fault+0x22/0x30
[ 428.574677] ? bkn_seq_dma_next_pos+0x2f/0x74 [linux_bcm_knet]
[ 428.644283] ? bkn_seq_dma_next_pos+0xa/0x74 [linux_bcm_knet]
[ 428.712844] ? bkn_seq_dma_next+0x13/0x23 [linux_bcm_knet]
[ 428.778289] ? seq_read_iter+0x33a/0x450
[ 428.825036] ? vma_set_page_prot+0x5e/0xc0
[ 428.873861] ? seq_read+0xd0/0x100
[ 428.914377] ? proc_reg_read+0x56/0xa0
[ 428.959046] ? vfs_read+0xa5/0x310
[ 428.999560] ? folio_add_lru+0x70/0xd0
[ 429.044229] ? _raw_spin_unlock+0x15/0x30
[ 429.092015] ? __handle_mm_fault+0xd90/0xfa0
[ 429.142916] ? auditd_test_task+0x39/0x50
[ 429.190702] ? ksys_read+0x6b/0xf0
[ 429.231216] ? do_syscall_64+0x5b/0xc0
[ 429.275885] ? handle_mm_fault+0xdb/0x2d0
[ 429.323672] ? preempt_count_add+0x47/0xa0
[ 429.372495] ? up_read+0x37/0x70
[ 429.410933] ? do_user_addr_fault+0x1bb/0x570
[ 429.462873] ? fpregs_assert_state_consistent+0x22/0x50
[ 429.525197] ? exit_to_user_mode_prepare+0x40/0x1d0
[ 429.583366] ? entry_SYSCALL_64_after_hwframe+0x69/0xd3
[ 429.645693]
[ 429.671666] Modules linked in: xt_TCPMSS(E) dummy(E) xt_hl(E) xt_tcpudp(E) ip6_tables(E) xt_conntrack(E) ebt_vlan(E) nft_compat(E) nf_tables(E) bridge(E) stp(E) llc(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) xfrm_algo(E) linux_ngbde(OE) linux_knet_cb(OE) linux_bcm_knet(OE) psample(E) linux_user_bde(OE) linux_kernel_bde(OE) i2c_dev(E) eeprom(E) bonding(E) tls(E) edac_mce_amd(E) kvm_amd(E) ccp(E) rng_core(E) kvm(E) irqbypass(E) drm_display_helper(E) cec(E) rc_core(E) drm_ttm_helper(E) ttm(E) scd(OE) drm_kms_helper(E) i2c_algo_bit(E) sg(E) evdev(E) pcspkr(E) k10temp(E) uio(E) nfnetlink(E) binfmt_misc(E) fuse(E) dm_mod(E) drm(E) configfs(E) efi_pstore(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) zstd(E) zstd_compress(E) nvme(E) nvme_core(E) t10_pi(E) crc64_rocksoft(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) nls_utf8(E) nls_cp437(E)
[ 429.671775] nls_ascii(E) uas(E) vfat(E) usb_storage(E) fat(E) overlay(E) squashfs(E) ahci(E) libahci(E) broadcom(E) bcm_phy_ptp(E) bcm_phy_lib(E) ohci_pci(E) tg3(E) libata(E) libphy(E) ptp(E) scsi_mod(E) ehci_pci(E) ohci_hcd(E) ehci_hcd(E) i2c_piix4(E) pps_core(E) scsi_common(E) usbcore(E) usb_common(E)
[ 431.028140] CR2: 00000000000002d0
[ 431.067885] ---[ end trace 0000000000000000 ]---
[ 431.122977] RIP: 0010:bkn_seq_dma_next_pos+0x2f/0x74 [linux_bcm_knet]
[ 431.199883] Code: e8 d3 fd ff ff 48 85 f6 74 5e 8b 4a 0c ff c1 83 7a 04 00 74 36 83 f9 40 7e 36 8b 7a 08 c7 42 0c ff ff ff ff 8d 4f 01 89 4a 08 <3b> 88 d0 02 00 00 7c 31 31 c0 48 89 42 04 8b 02 8d 78 01 89 3a e8
[ 431.424230] RSP: 0018:ffffa1160667bca8 EFLAGS: 00010202
[ 431.486574] RAX: 0000000000000000 RBX: ffff9544d0c9c870 RCX: 0000000000000001
[ 431.571910] RDX: ffff9543dfa822f0 RSI: 0000000000000001 RDI: 0000000000000000
[ 431.657109] RBP: 0000000000000000 R08: ffff9543dfa822f0 R09: ffff9544d0c9c898
[ 431.742298] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa1160667bd48
[ 431.827489] R13: ffffa1160667bd20 R14: ffff9543dfa822f0 R15: 0000000000000000
[ 431.912696] FS: 00007fc5c5852740(0000) GS:ffff9544fbc00000(0000) knlGS:0000000000000000
[ 432.009333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 432.077907] CR2: 00000000000002d0 CR3: 000000003ebb0000 CR4: 00000000000006f0
[ 432.163116] Kernel panic - not syncing: Fatal exception
[ 432.225477] Kernel Offset: 0x1b200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 432.354265] Rebooting in 10 seconds..

Any help would be appreciated.

@ymc-dabe
Copy link

We recently got a 7050QX-32 for testing out sonic on it. Sadly we are faced with the same issue.
It really would be great to make use of those older switches with sonic on it. So any help here would be really very appreciated.

@quxyzzy
Copy link

quxyzzy commented Jun 13, 2024

Hey @sgollnitz did you ever get this resolved?
Do you recall ever seeing errors along the lines of this?
UnknownPlatformError('Could not identify current platform')

That's what keeps happening to mine, with a fairly similar kernel panic dump like yours.

@Staphylo I believe there may have been a similar issue for the 7050QX-32S you were working on a while ago but I can't find whether the issue was on SONiC or sonic-buildimage.

@quxyzzy
Copy link

quxyzzy commented Jun 25, 2024

Wondering if sonic-net/sonic-buildimage#19338 might help to resolve this once merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants