Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linuxgpio and linuxspi issue: Official .hex from Atmel doesn't program correctly #1263

Closed
cinderblock opened this issue Jan 3, 2023 · 10 comments · Fixed by #1265
Closed
Labels
bug Something isn't working

Comments

@cinderblock
Copy link

cinderblock commented Jan 3, 2023

I'm programming an ATmega32u4 with avrdude (7.0-20221229 (0f956e9)) via linuxspi on a Raspberry Pi Zero W:

I'm trying to flash it with the official Atmel DFU bootloader: ATMega32U4-usbdevice_dfu-1_0_0.hex

avrdude -c linuxspi -p atmega32u4 -U flash:w:ATMega32U4-usbdevice_dfu-1_0_0.hex

I always get a verification mismatch at 0x7090 (device has 0xff, .hex has 0x7f). At first I assumed it was a bad chip, but then I was able to reproduce on a bunch of devices.

I was able to flash and verify with an older version of avrdude (6.3-20171130) with a different programmer.

Once programmed, I'm able to read back the flash from the device and created a "works.hex" (remove the .txt file extension).

  • The resulting .hex file seems to be identical to the official .hex (after hex2bin)
  • The recent version of avrdude can verify the device's flash against the ATMega32U4-usbdevice_dfu-1_0_0.hex
  • The recent version of avrdude is able to program and verify correctly with works.hex

Curiously, the problematic .hex has a bunch of different "blocks" of data. It's clear that one block of data ends after 0x7090, right where the verification has trouble.

Looking at the raw hex on the device after verification fails, it's written 0xFFFF to the device instead of 0x6C7F. So, the verification is working as expected. It's something in the writing process that isn't working as expected.

I suspect this is an issue related to #1068 but this is an even number of bytes... Maybe it's because they started at an odd address of 0x702F? Maybe the fix #1188 caused a regression?


I'm setting up a Pi + AVR to enable continuous testing and coverage on/of dfu-programmer. I'd be happy to make it available here.

ActionPi - ATmega32u4 DFU Tester

@mcuee mcuee added the unconfirmed Maybe a bug, needs to be reproduced by someone else label Jan 4, 2023
@mcuee
Copy link
Collaborator

mcuee commented Jan 4, 2023

Does this only affect linuxspi or linuxgpio? Does it affect other programmers when you use the later version of avrdude with other programmers?

Maybe your issue is the same as the existing issue #455.

Please try the patch in #455 to see if that fixed your issue.

The hex file in question has the following line and that is causing the issue.

:02708F006C7F14

@cinderblock
Copy link
Author

cinderblock commented Jan 4, 2023

The suggested patch (if(ihex.reclen&1){mem->buf[nextaddr+i]=-1;mem->tags[nextaddr+i]=TAG_ALLOCATED;}) does not change anything.


That patch got me looking in the right place.

The issue here is that ihex.reclen is even but the address is odd, so the last byte is not properly loaded. The ihex block is not-aligned to a word boundary at the start or end.


This is a simple hack that loads and verifies:

diff --git a/src/fileio.c b/src/fileio.c
index 482bb3a..e2eb9ed 100644
--- a/src/fileio.c
+++ b/src/fileio.c
@@ -348,6 +348,8 @@ static int ihex2b(const char *infile, FILE *inf,
           return -1;
         }
         for (i=0; i<ihex.reclen; i++) {
+          mem->buf[(nextaddr+i)|1] = -1;
+          mem->tags[(nextaddr+i)|1] = TAG_ALLOCATED;
           mem->buf[nextaddr+i] = ihex.data[i];
           mem->tags[nextaddr+i] = TAG_ALLOCATED;
         }

However this is a hack that should not be used because, in my example, it adds a byte at 0x702E, just before the unaligned block that is incorrect. So it doesn't verify properly with other versions of avrdude. It's also probably not doing bounds checking correctly and could crash things.


Looking at #455, I think the real issue here is the logic around the TAG_ALLOCATED. Because it is not possible to program an avr off of a byte alignment, if the ihex ever specifies either half of an aligned word, the other half must also be "allocated", or considered to be.

This could be hacked into the input handling as we've been doing, but I don't think that is the right place to put such logic. This belongs in the "output" side of things - the part that takes the AVRMEM struct and sends it to the device.

However, this begs a subsequent question. Is the right thing to just assume the unspecified byte is 0xff? Or do we need to read the current byte from the device first and merge the two memories. Some bootloaders might even do this already... This might be considered more correct, but we already don't do that with unallocated parts of a page.

@mcuee
Copy link
Collaborator

mcuee commented Jan 4, 2023

@stefanrueger

Please help to review the comments from @cinderblock. Thanks.

@mcuee mcuee added bug Something isn't working and removed unconfirmed Maybe a bug, needs to be reproduced by someone else labels Jan 4, 2023
@mcuee
Copy link
Collaborator

mcuee commented Jan 4, 2023

@cinderblock

Thanks for the detailed analysis. I have added the bug label.

Just wondering if you only encounter the issue with linuxspi/linuxgpio, or you also encounter the issue with other programmers. Thanks.

@stefanrueger
Copy link
Collaborator

@cinderblock Thanks for reporting and testing!

I suspect

  • This is related to linuxspi/linuxgpio
  • Git main avrdude will be OK with a different programmer and the "problematic" input file
  • Possibly also that avrdude6.3 will show the same problem with -c linuxspi

@cinderblock I note that linuxspi has no paged read/write routines; PR #1188 (that you suspect caused a regression) however only pads pages earmarked for paged writes. So I am confident PR #1188 is innocent. It does for pages what you hypothesise should happen for word writes: that not (for writing) allocated bytes in words should be read from the device first or initialised with 0xff. In this case (SPI programming via linuxspi), the memory looks like NOR-memory, so writing 0xff is a NOP.

The place where ISP writing happens is this section of avr.c

I think the logic of this code is good (though I probably would write it with a tad less complexity). The thing is: Although the ISP interface for the ATmega32U4 loads pages byte for byte (not word for word) into a buffer on chip, the data sheet says that the low byte must be written before the high byte for programming to work.

Much programming works via paged read/write functions in the programmer. Most files are uploaded in one block. So this has surfaced not so often, and when it did, was hard to debug.

I'll create a PR so you can test it. @cinderblock

@mcuee You will be pleased to hear that this just might be the bug in the Mega-Commit that you were chasing for some time now. Maybe you can test @cinderblock's .hex with linuxspi, too, and reproduce the issue and test the PR for correctness.

Here the data sheet.

load-low-byte-before-high

@stefanrueger
Copy link
Collaborator

BTW, your CI integration with the RPi looks just great. Vvv cool. @cinderblock When you get a minute check out the PR #1265 - I'm hoping this solves this problem and quite possibly #455 too.

@mcuee
Copy link
Collaborator

mcuee commented Jan 5, 2023

@stefanrueger

Great. Thanks for the detailed analysis and fast fix. I can confirm that PR #1265 is good to fix both this issue #1263 and the long standing issue #455.

@mcuee mcuee changed the title Official .hex from Atmel doesn't program correctly linuxgpio and linuxspi issue: Official .hex from Atmel doesn't program correctly Jan 5, 2023
@mcuee mcuee mentioned this issue Jan 5, 2023
@cinderblock
Copy link
Author

Thanks for the quick fix.

@stefanrueger regarding the CI tool, when I get it running in dfu-programmer, I'll see if I can make it work here too.

One issue right now is that this is a Pi Zero W, the ARMv6 based cpu. The GitHub Action Runner won't run on this hardware because .NET doesn't work on ARMv6, only ARMv7. I'm debating biting the bullet and swapping to a Pi 4 I have... For now just running some simple scripts.

@mcuee
Copy link
Collaborator

mcuee commented Jan 5, 2023

@stefanrueger regarding the CI tool, when I get it running in dfu-programmer, I'll see if I can make it work here too.

One issue right now is that this is a Pi Zero W, the ARMv6 based cpu. The GitHub Action Runner won't run on this hardware because .NET doesn't work on ARMv6, only ARMv7. I'm debating biting the bullet and swapping to a Pi 4 I have... For now just running some simple scripts.

I have two Raspberry Pi 400 and one of them has the extension ribbon cable plus T-shape I/O expansion board setup to test linuxspi/linuxgpio. The ribbon cable setup is not that robust though. It has the USB ports to test other programmers as well.

I also have two Raspberry Pi 3B+ and one of them has a I/O expansion board to test linuxspi/linuxgpio. It has the USB ports to test other programmers as well.

rpi_3bp_linuxspi_m2560

@cinderblock
Copy link
Author

cinderblock commented Jan 10, 2023

I decided I wanted to remake this onto a newer Pi to support GitHub Actions. This turned into a full on HAT.

v0 fabrication has started. (If you want a bare board, PM me)

image

Designs will be at https://github.com/cinderblock/avr-test-hat once I clean up and test a couple things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants