-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sbr Tool , checksum & intel board #27
Comments
Adefx, You're not alone. I've got a stack of R2312GZ/GL (various OEM models of course) I got from auction and am fighting this same battle (albeit well over a year later. I just ordered a few RMS25JB080 (2308-based mezzanine) controllers from eBay for pretty cheap to continue my investigation of the conversion of my existing RMS25CB080 mezzanine modules. I encountered issues with the expander card as well, causing a kernel panic on boot of BSD (TrueNAS) if connected when booting, and an instant panic if hot-plugged in while the OS was running. I have since reverted back to the original SBR and SPD and flashed the latest intel-hosted firmware package back onto the card. Right now i am looking at the checksum side of things in hopes to shed some light on my issues (one thing being that the BIOS on the S2600GZ/GL board states no adapter present once the crossflash is performed. Once reverted this correctly shows a RMS25CB080 (and PCIe3 support for the SAS module can be toggled again). I will add your provided SBR for comparison as well, though I suspect the same as mine. I ran into issues on the first of these servers I got a couple of years ago, and never really spent anymore time on this (although I did take a look into trying to implement PCIe bifurcation on the board, unfortunately unsuccessfully, not easily implemented with the available AMI tools I came across) The layout (second copy of the SBR data) on my RMS25CB080 does not match the layout of the LSI 9285 SBR file i have compared against. the Intel has the first instance of the data in 0x0000-0x00DF (checksum at 0x00DF) and the second set of data at 0x00E0-0x01BF, SBR file totaling 512 bytes (padded at the end with zeroes up to 0x01FF. On the 9285 file the first data is at 0x0000-0x007F (checksum at 0x007f, with the second set at 0x0080-0x00FF (256-byte boundary), with the remainder zero-padded to the 512-byte boundary. When the 2308 controllers arrive (hopefully this week), I will gather some information on those as far as revision, SBR comparison, etc. and post a copy of the SBR as i have yet to see any real information regarding these intel units. |
While Fohdeesha has laid the groundwork for us, I think we need to extrapolate the work done with the Dell controllers in hopes we can answer these questions on what Intel has done with their servers. The SBR files you posted are confirmed binary identical (with HxD) to the one I pulled off my 'sacrificial' RMS25CB080. |
Progress: The RMS25JB080 cards came in, and I pulled the SBR from them. comparing them with the RMS25CB080 SBR, there is very little different, but they have the same binary 'blob' I found on both my RMS25CB080 and RS25SB080 (8-port external raid PCIe) cards, so this might be what the System BIOS on the S2600GL wants in order to identify the card. In Aptio Setup Utility (BIOS) -> Advanced -> Mass Storage Controller Configuration, the card now reports as expected as "Intel(R) Integrated RAID Module RMS25JB080", so definitely something in the SBR that gets this validated (with an empty SBR or a modified SBR, this reported as "None" or "Empty" IIRC). Now, I've loaded the "9207-8.bin firmware (2308 IT mode) firmware from the 9207-8i and the mpt2sas bios and since loading the BIOS I'm seeing the AVAGO (LSI) BIOS hanging at "Initializing" with the RES2SV240 expander connected, and I don't get far enough to CTRL-C into the controller's BIOS to see what is going on. Disconnecting the expander and connecting port 0-3 directly to the 12-bay backplane, I don't have a problem, and a SATA disk detects just fine and boot process continues. SBR pulled from the RMS25JB080 (2308) is attached. EDIT: With the actual 2308-based RMS25JB080 installed, I did not have any hang-ups at initialization or kernel panics in FreeBSD with the expander connected, so this is definitely arising from something in the cross-flash of the RMS25CB080. I have an IBM expander I will swap in just to see if I get the same behavior with it, thinking this is some weird Intel OEM shenanigans that needs further investigation, but I suspect that I will get the same hang-up. I'll update this thread with all of my findings and with what I come up with as a workable scenario tonight, even if that does not include an expander... yet. |
Wow , you've gone a lot further than me ;-) |
Hey, I can totally relate. I'm finding my way by feel mostly with this, trying to get my head around the structures that LSI put together on these things. When I first attempted this (before Fohdeesha completed the PERC ISO), I ended up switching the RMS25CB080 out for a pair of 1068E-based cards and got on with life. Now after getting another pallet of these servers, I just figured it was worth another shot. I've managed to brick my test card about 7 times so far this weekend (through some clearly improper flashing attempts with sections from the RMS225JB080, but thanks to my cheapo CH341 EEPROM programmer I have managed to recover it by erasing the SBR each time. What was happening is that the flash was modifying the SBR data and clearly the Intel BIOS was unhappy about it, throwing an NMI (non-maskable interrupt) which stopped the machine in it's tracks. erasing the SBR wrote all 0xFF and allowing this to default to non-initialized 2208 values (1000:0081) for device IDs, so I could get back into DOS/EFI/Linux and start messing around again. My current status is somewhat where I was before, and now I'm trying some different connections to see what MAY work with the expander at all, and I've found something that is halfway functional, but far from ideal, at the moment: SBR: RMS25JB080 (512B Unmodified) To get the RES2SV240 expander connected, I can only use a single 4x SAS connection from the 4-7 ports on the controller, otherwise it kernel panics and takes a dump. If I either connect the 0-3 SFF-8087 to the controller on it's own, or connect both to the expander, I get the same kernel panic or BIOS hang as before. Just for a sanity check I tested this both with the backplane i2c cable connected and disconnected to the motherboard, that made no difference, so I'm suspecting something with PHY configuration to be the culprit, and of course performing some comparisons now. I'm testing with a simple Rocky Linux 8.8 release and kernel 4.18.0 (-477.27.1) with the mpt3sas driver (I've had issues with mpt3sas before and SAS2008 controllers on fedora), and I get a warning that the 2308 that my card is now impersonating is 'deprecated hardware', sure, whatever, it's working so far for my testing. Here is a quick output from lspci, lsscsi, sas2flash -list and lsiutil (option 16) to show the current config that doesn't dump out with the expander connected:
While that mptctl module error bugs me. I am still moving forward with this. I may just bite the bullet and put a CentOS 7 install on the SSD connected to the ICH SATA just to be sure I'm running a supported environment. Now, to dig into those PHY configs and see what the heck is going on here... |
Alright, now the plot thickens. I have identified the problem port. It is port/link 01 (second port). If I disable this port in lsiutil, I can get 6 links to the expander (2-7), which is an improvement over 4, but not quite the full 8. I am guessing that the ports must link in a minimum of pairs , and that is why port/link 0 also does not connect. I disabled it as well shortly after just to keep my config and resultant SAS topology consistent. When I said the plot thickens, here's why: While messing around with connections to establish a functional workaround that would still make use of all 8 links on the adapter, I connected port 0-3 to the first SFF-8087 (0-3) on the 12-port backplane, while adapter 4-7 was still connected to the expander. the expander remains connected to the second (4-7) and third SFF-8087 (8-11) on the backplane. I figured that meh, while I was at it I should test a few backplane locations just to make sure nothing screwy was going on, given this is a non-standard connection setup. On the backplane I went one by one with my old Samsung 160GB SATA drive. Slot 0, no problem. Slot 3, no problem. Slot 6, no problem. Slot 9, no problem. Slot 1... KERNEL PANIC, NMI!!! WTF, this wasn't even going through the expander? This whole time I am focused on PHY link aggregation being the root cause, but why would a single 1:1 connection take it down? Now I am very confused. Could this be an offset in NVRAM data not handling the PHY/LINK config? why port 1, and not 0, or 7. surely the config is a contiguous block of bytes in NVRAM? Am I fighting a hardware issue with this specific card? that doesn't make much sense, I've seen reports of the same behavior from other users after crossflashing. Did LSI layout the config in a different order than 0-7? if there is an offset, how come I can disable these ports successfully with lsiutil (the disablement is being written to NVRAM somehow, as it is persisting power off and initialization of the card).
So many questions, so little sleep that I am going to get before work tomorrow. |
scratch that pairs idea. just set phy 0 (handle 0001) online and now I have 7 links:
|
OK, update firmware, BIOS and UEFI BSD to P20 and I don't understand what I'm looking at here: It hasn't crashed, but then again, I haven't rebooted anything yet ( I did after updating the firmware/BIOS/UEFI), but not since enabling PHY 1 (HANDLE 0002).
gonna see what happens when I try a reboot. |
just a quick update. Attempting a shutdown with 8 links connected cause the same NMI kernel panic (as I had expected), this appears to be caused by the PCIe link dropping at the CPU root port. I'm analyzing some PCI(e) config space and comparing a few different things between the 2208 and 2308 modules to see if there is something there that stands out. I'm also taking a good look at the PERC SBRs that Fohdeesha has in the ISO to see if there is something I have missed. probably gonna spend a good bit of time going through these things, but will update when I find something. 7/8 is not bad as far as bandwidth (42Gb/s vs 48Gb/s), but there has to be a reason that the link is dropping when the second link is brought up in any capacity. I'm not sure what's so special about that port, so I need to investigate a few more things before I proceed. While I do consider it progress in figuring out WHY the expander issue was/is happening, I want to get to the root of this to really explain why it happens at all, and if/how this card ca neb truly crossflashed and made stable in that scenario. I'm wishing i had a H710P in a R720 right now so I could do some comparisons between the two implementations. |
On a physical hardware note (in case anyone else has the same curiosity with these "proprietary" connectors as I do), having dealt with many SuperMicro blade boards and these Intel servers: This "SIOM" connector for the SAS modules appears to be an "Archer 0.8" - 0.80mm pitch mezzanine connector. The pinout is published in both the RMS25(JB/KB)0(80/40) and RMS25(CB/PB)0(80/40) hardware user guides (still available on Intel's web site, at least for now). The part numbers that I believe these connectors to be are the 80-contact variants of the Archer 0.8, currently under the brand name Harwin. My conclusion is based upon the dimensions I have observed, and the physical appearance of these connectors. Female (SAS Module): M58-2800842R This may interest anyone attempting to build a standardized PCIe adapter board for these modules, something I am not entirely opposed to doing myself at this point. the links for the user guides are below, but i have attached the 3 pages showing the connector and pinout (they are the same between the two documents, although Intel appears to have included the dimensions of the 40 pin variant, not the 80 pin as is actually installed. RMS25xB080_Pinout.pdf RMS25JB080 (2308) RMS25CB080 (2208) |
I'm still working on this, just haven't made any significant discoveries since finding the port that appears to be causing the failure. |
...Waiting on replacement SOIC-8 test clips as I wore out the one I was using for reading the SBR EEPROMS on various generations and revisions of LSI cards at my disposal, Trying to further 'demystify' the SBR section and any other data stored on these EEPROMS. The structures/layout I've seen so far is different between the 2008/2108 cards and the 2208/2308-based Intel modules (and Dell SBRs that Fohdeesha has in the ISO). Hopefully I can shed some light on what all differs as soon as the new test clips arrive. |
I love to read your effort breaking this intel no sense ! It's like a low
level hardware odyssey. Good wind my friend !
Le sam. 2 déc. 2023, 23:13, oddballracing ***@***.***> a
écrit :
… ...Waiting on replacement SOIC-8 test clips as I wore out the one I was
using for reading the SBR EEPROMS on various generations and revisions of
LSI cards at my disposal, Trying to further 'demystify' the SBR section and
any other data stored on these EEPROMS. The structures/layout I've seen so
far is different between the 2008/2108 cards and the 2208/2308-based Intel
modules (and Dell SBRs that Fohdeesha has in the ISO). Hopefully I can shed
some light on what all differs as soon as the new test clips arrive.
—
Reply to this email directly, view it on GitHub
<#27 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3BDGN2532VP3F7QRPS4IDYHORZLAVCNFSM5OCRKN22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBTG4ZDMNRWGMYQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi buddy, i've been following your "issue" topic on the p710 flashing problem.
I was researching exactly the same thing because i try for days to crossflash a Intel integrated 2208 raid card ( formely a rms25cb080 )
Of course and thanks to intel the hardware is locked on intel Motherboard. But well , i got 2 of them .
I ve succeed in flashing this 2208 ( D1 ) card with an lsi firmware, but the result is it doesn't detect any disk in IT mode, and the MPT2sas detection freeze when using a intel 24port expander.
So i was thinking the problem come from the SBR , ( i flashed a blank 512byte one ) and i tried to edit the original dumped one.
When i try to parse it with SBRTOOL , i got some checksum error ( MFG DATA copies differ, using first / mfgdata cheksum error / sas adress checksum error.
So i used the one you provided = checksum-sbrtool.py and i get a checksum value 148 / proper checksum 148 not sure why it's the same number. and after that mfg1 and mfg2 which are the same...
From here i'm lost... could you help me ?
Attached my original sbr file.
Intel Sbr.zip
Thanks a lot !
The text was updated successfully, but these errors were encountered: