-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory reported incorrectly during PXE Boot on RPi 2B #1041
Comments
You will be left with a 128MB system if the fixup.dat file doesn't match the start.elf - the Pi must have a matched pair. I suggest using md5sum to confirm that the correct files are on the server, and check in the logs on the PXE server that they are being sent. Assuming that the above doesn't lead to a working system, you can get some boot-time diagnostics from the UART (on header pins 6, 8 and 10) using a modified (and recent) bootcode.bin:
|
If fixup.dat would not match start.elf then the same behavior would be observed in case of booting from SD card directly. As I wrote, the very same set of files works when booted from SD card, but results in 128M ram when PXE booted over network. I will provide bootlogs this evening, have plenty of them. PS. All the files I was using (bootcode.bin, start.elf, fixup.dat) come from official github repository, hash 3221a3d, downloaded yesterday. |
Please find bootlogs attached. For clearance I'Ve removed entries regarding EDID (no monitor attached)
All four test cases done with freshly loaded files from main repository, same hardware, same files. |
Interesting. Notice that in the two boots from SD card the log includes the line:
whereas in the failing PXE boot cases it is absent. This is the immediate cause of the missing memory, but we still need to determine the root cause. Before I dive into the source of bootcode.bin, can you get logs from your PXE DHCP/TFTP server showing the transactions in the success and failure cases? Just one of the pairs, e.g. DTOK=yes, would be sufficient. |
I need to adjust settings of tftp server, gimme an hour (I'm afk now). But in the mean time I have changed start.elf and fixup.dat to versions from "next" branch, leaving bootcode.bin unchanged. And there it comes:
So, it must be something in start.elf right? How else the same bootcode.bin would fail in one PXE boot case (with up to date start.elf and fixup.dat) and success in another PXE boot case (with start and fixup from older branch)? So, as promised, logs from tftp in about one hour... |
Here's the log from tftp server: Current version of bootcode.bin start.elf and fixup.dat:
Current version of bootcode.bin, but start.elf and fixup.dat from "next" branch:
As you may see, in current start.elf the fixup.dat is not even asked for... |
That doesn't make much sense (yet), because bootcode.bin loads both start.elf and fixup.dat (which is essentially relocation and resizing information which must be applied before execution passes to start.elf). Down the rabbit hole it is, then... |
Let me know if I can help somehow apart from testing. |
Another observation. The same behaviour (current start.elf loads, fixup.dat does not load) occurs with PXE boot of both versions of bootcode.bin, the current one as well as the one from "next" branch. SO it really looks as if it would be related to start.elf only. |
From an initial read of the high level boot flow it is not possible to get to executing "start.elf" without first trying (to some extent) to load "fixup.dat", so that attempt must be failing. Deeper we go. |
Maybe some garbage in return value used as conditional further in the code? Something like: PS. I've ordered raspberry pi 3b+ and will test the PXE boot there (without bootcode.bin), either today or tomorrow. |
Although the name of the fixup file is conditional, trying to load it isn't. I can't reproduce the problem here. This is a Pi 2B network booting the latest firmware:
What is in your config.txt? |
This is my config.txt: kernel=aros-armeb-raspi.img |
Those options are ignored by bootcode.bin, so it's effectively empty. Can you run the following commands? You'll have to amend the paths (and where you run the commands) for your system, since you aren't running Raspbian. I've included the responses from my system (with the SD card automatically mounted to /media/pi/boot and the TFTP directory for this Pi mounted on /boot) with the latest firmware:
|
I will when I get home where the system is (in about 6 hours). In the mean time I've checksumed the files on my local build (the same build system, theoretically the very same files): bash-3.2$ md5 bootcode.bin fixup.dat start.elf the md5 for bootcode.bin is with BOOT_UART=1. In 6 hours md5 sum of the files as they are on sdcard and/or tftp server... PS. Could it be that memory reservation for my initramfs corrupts somethig? or is it irrelevant? |
bootcode.bin only cares about start*.elf, recovery.elf, fixup*.dat, config.txt and autoboot.txt, and a very small subset of config.txt settings (which doesn't include kernel= or initramfs) - everything else is handled by start*.elf. |
I've just checked the files on tftp server. They do have exactly the same md5 sums as the ones you have. Can it be my raspberry is broken? |
It seems improbable that there would be hardware fault that would only show up with a few images, and to do so as consistently as this. I'll put together a new bootcode with extended diagnostic checks and report back with a download link. |
Another observation: From time to time rpi 2b can load more or less files from tftp server. Once it managed to load fixup.dat and start.elf (the new ones) but failed to load config.txt and thus started to blink the green led constantly. After another reset cycle the old behaviour came back - the start.elf and config.txt are loaded, but fixup.dat not. In all doubt I have also tried another tftp server. Until now I was using dnsmasq, now for test purposes I've switched to apples tftp server. Here, start.elf and fixup.dat were loaded only once, after another try (and now actually all the time) only start.elf gets loaded. Neither fixup.dat nor config.txt are received. So, maybe there is some issue with tftp implementation in bootcode.bin? Or maybe some timing issue? |
I have just received raspberry3b+ and have similar issue. I'm now really sure it has something to do with the timing, as I do not have any other issues with my local network: serial port log from raspberry (again, with hdmi logs removed):
Interesting is that raspberry said it is loading my kernel files, whereas tftp says something different now (please note the "failed sending" messages):
The files are definitely there. I have just connected with another tftp client over network, set binary mode and fetched all the files. I have also verified with md5 that the transferred files are exactly the same as on tftp server... |
Can you install tcpdump on the server and sniff the traffic during booting:
(You may need to change the name of the interface from I'm hoping this will give answers faster than a special diagnostic build, or at least it will guide me as to where to insert the extra logging. |
tcpdump for raspi3b+:
tcpdump for raspi2b:
I have cloned the firmware from github to my local machine. Tomorrow I will try to find the commit which is the first one where fixup.dat is not loaded on my machine. Will let you know asap. |
Sorry, can you do that again with port tftp instead of bootpc ? |
some tcpdump data for raspi 3b+: actual start.elf (not loading):
additionally:
old start.elf (with successful loading of fixup.dat):
Additionally:
|
I think we're getting somewhere, but there isn't enough information in the traces to show that the server's responses are the same (or different) in the success and failure cases. To save lots of back and forth, can you capture the traffic?:
Afterwards you can view the capture to see that it looks correct with:
If you can upload both capture files somewhere - DropBox, Google Drive etc. - I'll take a look ASAP. |
And the same for raspberrypi 2b: https://drive.google.com/open?id=1XfWnOpbcnu3__5j-JMZp69loxOsKFknI |
Thanks - the captures are essentially the same for the two Pis, so you can stick to one from now on. The data shows the Pi attempting to open fixup.dat, and the server sending a response containing the size of the file. In the success case the Pi then sends an ACK for block 0 which kick-starts the transmission, whereas in the failure case it goes on to the next file. From the logs you sent earlier I can see that there is no significant delay (400us) between requesting fixup.dat and recovery.elf, which suggests the response was received and rejected rather than going astray. The only difference in the server responses is the file size - 6567 for success and 6660 for failure. Can you try a hybrid system with the latest start.elf but the older, smaller fixup.dat? Since the Pi can't know the content of the file at that stage, if that changes the behaviour then the problem is size-related. If that doesn't change things, try the opposite pairing - old start.elf with new fixup.dat. Note that the firmware should ignore a fixup.dat that doesn't match the .elf file, so you will still end up with only 128MB RAM. |
P.S. Further captures won't tell us any more at this stage - just look whether or not fixup.dat was fetched. |
old fixup.dat and new start.elf - fixup.dat is not loaded: new fixup.dat and old start.elf - fixup.dat is loaded: |
That timeout message is a big clue, eliminating a lot of possibilities. I'll refine the logging further tomorrow. |
You seem to be running Mac OS on non-Apple hardware (judging by Gigabyte OUI) ==
@pelwell |
No. Instead I have tried two different tftp servers on the machine and both misbehave with raspberry, but both misbehave differently. In order to verify the tftp itself is not an issue here I have tried connecting to the tftp server from other machines and other tftp clients within same network. All tftp clients but bootcode.bin worked properly. This alone is an argument for me that the issue is not related to the machine I'm using. |
Problem is that the bootcode is less tolerant than your typical tftp client. If there is any packet loss, it is fatal to the bootcode, as indicated in the issues linked. Also a normal tftp client may not request the file size (it's an optional tftp extension). |
Therefore I hope that my opened issue will help to make bootcode more tftp rfc compatible and failure resistant and thus will also help people who reported the two issues you've mentioned. The problem is, actually, that you do now know how many people are using network booting on raspberry. Until now all tutorials I found suggest using the outdated "next" branch which do not have any issues and I could just happily use it too, unless I try to boot pi 3b+. Once such people will change to the main branch and will upgrade boot files, many of them can be frustrated by finding out that their pxe-boot raspberry configuration suddenly stops working. |
Do note that piserver uses main branch firmware files (same versions that ship with Raspbian), and does is able to boot Pi 3+. While I do agree with you that it would be nice if someone would fix the firmware's shortcomings. |
I'm open to reviewing timeouts and retries - everything seems to be one second - once I've understood this problem. There's another debug bootcode.bin in the usual place - it gets more chatty once it gets to the fixup file. |
And there's debug log. I have attached two files there. raspi-50.cap is the non working one, whereas raspi-51.cap is with older fixup.dat + startup.elf combination. Sorry I can help only this way. Let me know if I can do anything else helping you :) |
Thanks. Those results are not at all what I expected, but that's no bad thing. At first glance I can't work out what's going on... I can only apologise about the slow progress on this - just keep on running the tests and posting the results. |
Try this one for size. |
As you've probably worked out, the code is appearing to receive packet full of zeroes and ignoring it, then the retry arrives too late. I'm going to need to think about this some more (what state could be left over from start.elf?). A quick fix might be to up the retries, but I'm out of time today. |
and here not tftp grab translation of pcap file for this failing transmission. As you may see there the server sends OACK twice right after receiving READ: PS. I assume there is no quick way to let me do tests (provided I sign any NDA needed)? READ 192.168.2.126:49158 > 192.168.2.112:69 "fixup.dat" octet tsize=0 |
Are you also able to sniff what actually goes over the line (e.g. using managed switch with mirror port functionality + other computer) instead of on the server itself? Although it may be unlikely, if the server actually sent a zeroed packet, that would not show up in tcpdump if run on the server itself. |
no, unfortunately not... |
I should have made it clear that I doubt the fault is at the server end, but who knows. |
Suggest you leave all options open. Wouldn't be the first time someone experiences odd network issues on a Hackintosh. |
This one attempts to learn a bit about the final pieces of start.elf, displays the contents of an important status register, and also increases one of the outer timeouts to 5 seconds - which ought to allow for some retrying. |
looks good! |
If increasing the timeout is all it takes then that's an easy, low-risk fix. There's a release candidate without the additional tracing, and BOOT_UART off by default, here. |
I can confirm this version works properly with the newest fixup.dat and start.elf files. I have also tested this on raspberry pi 3b+. Here, since boot heads over to bootcode.bin downloaded from tftp server, the fix is working properly too.:
|
The patch has been submitted internally, and should appear in the next firmware release. |
I will test and report as soon as the new firmware is released. |
2ndstage: Increase eth_open timeout to 5 seconds See: #1041 firmware: video_encode: Use default values on invalid nStride or nSliceHeight See: #1051 firmware: gpioman/FXL6408: Handle open failing sensibly See: #1053 firmware: Delay backlight coming on See: #1052 firmware: LCD driver close fixes 2ndstage: ignore autoboot.txt if boot partition is already set See: raspberrypi/noobs#508
2ndstage: Increase eth_open timeout to 5 seconds See: raspberrypi/firmware#1041 firmware: video_encode: Use default values on invalid nStride or nSliceHeight See: raspberrypi/firmware#1051 firmware: gpioman/FXL6408: Handle open failing sensibly See: raspberrypi/firmware#1053 firmware: Delay backlight coming on See: raspberrypi/firmware#1052 firmware: LCD driver close fixes 2ndstage: ignore autoboot.txt if boot partition is already set See: raspberrypi/noobs#508
Fix should be in latest rpi-update firmware |
I confirm latest firmware boots over tftp properly with both raspi 2b and 3b+:
and
|
On RaspberryPi 2B the memory detected and reported by VC (through ATAGS as well as through mbox interface) is wrong when device boots over network. VC Says the ARM memory starts at 0x00000000 and has a size of 0x08000000, reported VC memory starts at 0x08000000 and has either a size of 0x08000000.
When booted directly from SD card correct amount of memory is reported (ARM: base 0x00000000, size 0x3b400000, VC: base 0x3b400000, Size 04c00000).
Steps to reproduce:
Please note I have tested this behaviour on RaspberryPi 2B only.
The firmware from "next" branch is not affected and operates properly in either boot mode.
The text was updated successfully, but these errors were encountered: