Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nRF52 DK Power Loss Reset Issue #1643

Closed
ShahinHaque97-CT opened this issue Mar 10, 2023 · 21 comments
Closed

nRF52 DK Power Loss Reset Issue #1643

ShahinHaque97-CT opened this issue Mar 10, 2023 · 21 comments
Labels

Comments

@ShahinHaque97-CT
Copy link

Hello,

I have implemented Serial DFU following this guide - https://devzone.nordicsemi.com/guides/nrf-connect-sdk-guides/b/software/posts/ncs-dfu#serial_dfu

Edited the overlays/config to use external flash and everything is working fine.

I have been doing some tests for reliability, if there is power loss during the download of the update, everything is fine as the application in slot 1 is still running and I can always redownload the image into slot 2.

However, if there is power loss when I am swapping the images, I have an issue.

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

The above log is repeated everything I press the RESET Button on the development board and the application does not run.

If I remove the power via USB, and plug it back in the application runs absolutely fine, and thus I know the image is not corrupted.

What's the difference between power on reset and nRESET on the chip? I am assuming a flag is set for nRESET which is preventing the application from being sorted.

@nordicjm
Copy link
Collaborator

Has the board booted into serial recovery mode? That would prevent the application from booting

@ShahinHaque97-CT
Copy link
Author

@nordicjm
How would I be able to know? The above log is what is displayed upon MCU reset button.

@nordicjm
Copy link
Collaborator

If the above is what you see, then your application has been booted, if you attached a debugger to it you should see your program counter would be in your zephyr application's address space.
To check if it's in serial recovery mode, use the image list command e.g. mcumgr --conntype serial --connstring dev=/dev/ttyACM0 image list

@ShahinHaque97-CT
Copy link
Author

@nordicjm

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image erase
Done C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image upload -e build/zephyr/app_update.bin
53.61 KiB / 53.61 KiB [=================================================] 100.00% 2.50 KiB/s 21s

Done C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image list
Images:
image=0 slot=0
    version: 0.0.0
    bootable: true
    flags: active confirmed
    hash: f2ec040399383f8501b35bd555ff551e366b78335322772fdc0c4c4478a7bb4d
image=0 slot=1
    version: 0.0.0
    bootable: true
    flags:
    hash: 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4
Split status: N/A (0) 

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image test 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4
Images:
image=0 slot=0
    version: 0.0.0
    bootable: true
    flags: active confirmed
    hash: f2ec040399383f8501b35bd555ff551e366b78335322772fdc0c4c4478a7bb4d
image=0 slot=1
    version: 0.0.0
    bootable: true
    flags: pending
    hash: 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4
Split status: N/A (0) 

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" reset
Done
#############Power has been removed here while the swap is happening.###########

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image list
Images:
image=0 slot=0
    version: 0.0.0
    bootable: true
    flags: active
    hash: 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4
image=0 slot=1
    version: 0.0.0
    bootable: true
    flags: confirmed
    hash: f2ec040399383f8501b35bd555ff551e366b78335322772fdc0c4c4478a7bb4d
Split status: N/A (0) C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>

From the logs it shows from the image test that the images have swapped over. judging from the hash.

@ShahinHaque97-CT
Copy link
Author

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: primary slot
00> I: Swap type: test
00> I: Starting swap using scratch algorithm.
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

Reset Button Pressed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: revert
00> I: Starting swap using scratch algorithm.

Reset Button Pressed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x3, image_ok=0x3
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: primary slot
00> I: Starting swap using scratch algorithm.
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot
00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader

Reset Button Pressed

00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

After USB power is removed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> TEST 10/03/2023 nrf52dk_nrf52832
00> build time: Mar 10 2023 14:23:55

(The current program just displays that string.)

@nordicjm
Copy link
Collaborator

00> I: Starting swap using scratch algorithm.

Rebuild mcuboot using swap using move, not swap using scratch, then retry

@ShahinHaque97-CT
Copy link
Author

@nordicjm

The mucmgr logs are the same,

00> Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: revert
00> I: Starting swap using move algorithm.
00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

Reset Button Pressed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

Reset Button Pressed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

Reset Button Pressed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

After USB power is removed

00> *** Booting Zephyr OS build v3.2.99-ncs1 ***
00> After flash! TEST 10/03.2023 nrf52dk_nrf52832
00> build time: Mar 10 2023 14:46:56

//-------------------------

To change the config this was done
CONFIG_BOOT_SWAP_USING_SCRATCH=n
CONFIG_BOOT_SWAP_USING_MOVE=y

@ShahinHaque97-CT
Copy link
Author

Adding CONFIG_GPIO_AS_PINRESET=n along with Move instead of Scratch, the power loss seems to be working (even without pressing the button), Have only tried with a small sample size so far as end of work day.

May I ask your reasoning to suggesting Move instead of Scratch? I have my scratch partitions allocated and they work (apart from the power loss).

I will do some more tests to confirm it is working

@utzig
Copy link
Member

utzig commented Mar 10, 2023

May I ask your reasoning to suggesting Move instead of Scratch? I have my scratch partitions allocated and they work (apart from the power loss).

Amusingly enough swap with move was broken until I fixed it recently with #1597. So if you're using NCS 2.2.0 you still have the one that is broken. It would be more helpful if you could dump the status area from the primary slot when your issue happens. Also how big are your images?

@ShahinHaque97-CT
Copy link
Author

@utzig The image is currently very small as I am using the hello world example for testing. As this is early development, I will try my external flash dfu configs on the latest SDK and see if that has corrected my issue. Encase that doesn't work, how will I display the status dump area? is there a debug config for this?

@ShahinHaque97-CT
Copy link
Author

@utzig
Using the latest SDK, with a small test sample of 10 updates so far, I have been successful with internal scratch, external flash. However, with external scratch, the halt is still present with reset button. I am still prompted with the original logs

@ShahinHaque97-CT
Copy link
Author

Update:
Using External Flash & External Scratch, I have been able to get it working without bricking by changing the scratch partition to a smaller size of 4KB. The error is probably still there but too quick for me to catch it. the chance of the error is more frequent as the scratch size is increased

@nordicjm
Copy link
Collaborator

@utzig perhaps this is the same issue I used to see with swap using scratch. I'm pretty sure I was able to get the problem to occur when only using internal flash too but maybe I'm misremembering

@utzig
Copy link
Member

utzig commented Mar 14, 2023

@utzig perhaps this is the same issue I used to see with swap using scratch. I'm pretty sure I was able to get the problem to occur when only using internal flash too but maybe I'm misremembering

Yes, that's what I thought.

@ShahinHaque97-Corintech Nice investigation! This might be the same issue as described in #1550, and many times before by @nordicjm. Have you also tried changing the size of the scratch in the internal flash to see if it ever bricks? So, if you end up never seeing this bug with a scratch of 4KB in the QSPI flash then it could point to an issue in the QSPI flash driver for writes larger than a 4KB, which would not be a sector anymore (AFAIR).

I've never tried using scratch on the external flash myself, but I'll try to do it later today. It could be that just the amount of writes on the scratch partition is enough to trigger some bug in the QSPI interface or flash driver. But that's just a guess!

how will I display the status dump area? is there a debug config for this?

To dump the status area, you just need to read the last 4KB sector of the primary partition using whatever tool you are using to write the flash, nrfjprog I would assume. Then either zip and upload it here, or gist an hexdump of the dump. That should give us more information of where the interrupt happened. Since this might also be related to scratch writes failing in some way, it might be useful to also dump the scratch partition, although I am not entirely sure it would actually be helpful...

@ShahinHaque97-CT
Copy link
Author

I am using an nRF52832 with external flash as a cost saving measure instead of the nRF52840. the nRF52832 is using SPI rather than QSPI. The flash currently used for testing is the Winbond W25Q16JV on the nRF52 DK development board, IC used for production is yet to be determined.

@ShahinHaque97-CT
Copy link
Author

My pm_static file is listed below:

app:
address: 0xc200
end_address: 0x62000
region: flash_primary
size: 0x55e00

external_flash:
address: 0x56000
end_address: 0x200000
region: external_flash
size: 0x1aa000

mcuboot:
address: 0x0
end_address: 0xc000
placement:
before:
- mcuboot_primary
region: flash_primary
size: 0xc000

mcuboot_pad:
address: 0xc000
end_address: 0xc200
placement:
align:
start: 0x1000
before:
- mcuboot_primary_app
region: flash_primary
size: 0x200

mcuboot_primary:
address: 0xc000
end_address: 0x62000
orig_span: &id001
- app
- mcuboot_pad
region: flash_primary
size: 0x56000
span: *id001

mcuboot_primary_app:
address: 0xc200
end_address: 0x62000
orig_span: &id002
- app
region: flash_primary
size: 0x55e00
span: *id002

mcuboot_scratch:
address: 0x56000
end_address: 0x74000
placement:
after:
- external_flash
- mcuboot_secondary
align:
start: 0x1000
region: external_flash
size: 0x4000

mcuboot_secondary:
address: 0x0
device: DT_CHOSEN(nordic_pm_ext_flash)
end_address: 0x56000
placement:
align:
start: 0x4
region: external_flash
share_size:
- mcuboot_primary
size: 0x56000

sram_primary:
address: 0x20000000
end_address: 0x20010000
region: sram_pri

//-----------------------------------------------------

Am I correct in saying, I would need to do a nrfjprog --memrd X --n 4096
where X is the primary partition offset by 4Kb, which location would this be?
Am I using mcuboot_primary_app? 0x62000 - 0x1000 = 0x61000
and so: nrfjprog --memrd 0x61000 --n 4096

@ShahinHaque97-CT
Copy link
Author

My original issue was with power loss during the swap it is stuck in the state as mentioned in the first post. using nRESET reloads the log but removing USB POWER allows the code to finish the swap and continue running.

To provide the logs, I have used nrfjprog --readcode to show the entire hex dump. I have attached each stage of the progress. However one thing to note, after it is stuck in the loop, I have done a hex dump which disconnects RTT. Reconnecting RTT, while USB POWER is still connected, allows the application swap fine. I am I am assuming it doesn't set the same flags as nRESET.

The files contained are as follows:

1 - Full chip reset using nRF Connect Programmer
2 - Application built using west build -b nrf52dk_nrf52832 -p && west flash --erase
3 - Using image test, successful swap.
4 - Reverted back to the first image, Just my doing a reset
5 - Power loss
mcumgr --conntype serial --connstring "COM23,baud=115200" image test HASH
mcumgr --conntype serial --connstring "COM23,baud=115200" reset
USB removed.
USB reconnected.
RTT loaded.

00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: revert
00> I: Starting swap using scratch algorithm.
00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x3, image_ok=0x3
00> I: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2
00> I: Boot source: primary slot
00> I: Starting swap using scratch algorithm.
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first image slot

Entire process is repeated with 5v2

nRF52832_reset_logs.zip

@ShahinHaque97-CT
Copy link
Author

Update instead of removing USB power, RTT disconnect and reconnect works fine. Is there a way of setting up a timeout to reset the RESET flags and so it'll boot up as if it is power on reset. This isn't really solving the issue but masking it allowing to run though.

@ShahinHaque97-CT
Copy link
Author

A more pressing issue -

I decided to try my code with Bluetooth as that is the intended purpose. Using this - https://devzone.nordicsemi.com/guides/nrf-connect-sdk-guides/b/software/posts/ncs-dfu#ble_testing

Latest SDK.

External Flash / External Scratch. With power loss during swap, the device is completely bricked, Even with removing USB power, the device is stuck in this state as shown below:

00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
00> I: Scratch: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3
00> I: Boot source: scratch
00> I: Starting swap using scratch algorithm.
00> E: Image in the primary slot is not valid!
00> E: Unable to find bootable image

nReset button pressed

00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
00> I: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2
00> I: Boot source: none
00> I: Swap type: test
00> E: Image in the secondary slot is not valid!
00> E: Image in the primary slot is not valid!
00> E: Unable to find bootable image

USB power killed

00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3
00> I: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2
00> I: Boot source: none
00> I: Swap type: revert
00> E: Image in the primary slot is not valid!
00> E: Unable to find bootable image

As this is not recoverable this is a major issue, the device will have to be reflashed.

@ShahinHaque97-CT
Copy link
Author

With internal scratch, external flash it works

I thought it worked with restarting USB due thinking it was crashing from the logs below:

//-------------------------------------------------------------------------------------------------------------
00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I: Jumping to the first i

*** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bootloader chainload address offset: 0xc000
00> I:

00> *** Booting Zephyr OS build v3.2.99-ncs2 ***
00> I: Starting bootloader
00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1
00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
00> I: Boot source: none
00> I: Swap type: none
00> I: Bo

//-------------------------------------------------------------------------------------------------------------

However, with the Bluetooth application, I noticed that it is still running! (I couldn't tell with the serial method as it was a helloworld example with just a string message) I then switched RTT to UART and it works. This means when the Application swaps between MCUBOOT/SLOT1/SLOT2 RTT is disconnected not crashed.

So we have Serial/BT working with external flash & internal scratch!

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants