Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mcumgr: fail to upgrade nRF target using nRF Connect #24706

Closed
mniestroj opened this issue Apr 26, 2020 · 7 comments · Fixed by #25339
Closed

mcumgr: fail to upgrade nRF target using nRF Connect #24706

mniestroj opened this issue Apr 26, 2020 · 7 comments · Fixed by #25339
Assignees
Labels
area: Flash area: mcumgr bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: high High impact/importance bug

Comments

@mniestroj
Copy link
Member

Describe the bug
First of all, this is not nRF Connect fault, it is just possible to reproduce bug using this application. nRF Connect application fails to upgrade when there are two image slots already filled in Zephyr. It just goes back to "connected device tab" very quickly, with no popup.

There is no such problem when using mcumgr console application. They however differ in commands that they execute within mcumgr SMP protocol. mcumgr application sends "image erase" first, then continues with "image upload" requests. nRF connect does NOT send "image erase". In theory (based on the mcumgr/smp implementation) sending "image upload" as the first command should work as well, because erase should be done implicitly. In fact mcumgr in Zephyr tries to do that. size argument passed to flash_erase() function is the size of image that is going to be uploaded. This fails with -EINVAL, because soc_flash_nrf.c expects size to be page aligned. So here is the reason why mcumgr console application succeeds: explicit "image erase" mcumgr/smp command results in erase of the whole slot, which is obviously page aligned.

One possible workaround so far is to enable CONFIG_IMG_ERASE_PROGRESSIVELY=y, so implicit flash erase operations are done one sector at a time (with proper page alignment).

A clear and concise description of what the bug is.
What have you tried to diagnose or workaround this issue?

To Reproduce
Steps to reproduce the behavior:

  1. Build firmware with mcuboot and mcumgr support for some nRF MCU (e.g. nrf52840)
  2. Flash mcuboot and application image, so Zephyr will properly boot
  3. Upload some application image to second slot using mcumgr <...> image upload command (or just make sure there is somethin detected on second slot when doing mcumgr <...> image list)
  4. Try to do DFU using nRF Connect application (so it will try to overwrite second slot)
  5. Application should go back to previous screen very fast, i.e. firmware image will not be transfered to Zephyr target

Expected behavior
New application image should be transfered over BT and then booted after waiting some additional time (few seconds).

In the implementation: most likely mcumgr code should "ceil" image size, so flash erase request to flash drivers is sent with page aligned size.

Impact
On a "fresh" target (when there is only one image slot filled) it is only possible to upgrade firmware only once. Any further attempt using nRF Connect application fails.

Screenshots or console output
No logs are printed, because mcumgr implementation is very quiet, when handling errors.

Environment (please complete the following information):

  • OS: Android nRF Connect application
  • Toolchain: Zephyr SDK
  • Commit SHA: 2599f70
@mniestroj mniestroj added bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx area: mcumgr area: Flash labels Apr 26, 2020
@carlescufi
Copy link
Member

cc @philips77

@carlescufi carlescufi added the priority: low Low impact/importance bug label Apr 28, 2020
@carlescufi
Copy link
Member

@nvlsianpu and @de-nordic could you please comment on what the root cause might be?

@mniestroj
Copy link
Member Author

mniestroj commented Apr 28, 2020

@carlescufi Image size is not page aligned (most of the time). However it is passed directly to flash_erase() API in which it fails, because nRF SoC flash driver treats such request as invalid. The reason why it stopped working (and obviously it was working some time ago) is most probably (I didn't check that, but I am pretty sure) related to mcumgr updates that were made recently.

The real question is: on which layer should that issue be solved and how? In other words: where should unaligned new image size be rounded up to next page boundary? Possible answers: mcumgr, flash area/map API, flash driver and maybe some more candidates in between.

@nvlsianpu
Copy link
Collaborator

@mniestroj - thanks for this description. I'm going to investigate and fix that.

@nvlsianpu
Copy link
Collaborator

@mniestroj Your diagnose is proper. This looks like not well implemented feature (and zephyr shim layer as well which is the issue root). Looks like image status are is never erased as well. Preparing the patch.

@nvlsianpu nvlsianpu added priority: high High impact/importance bug and removed priority: low Low impact/importance bug labels May 11, 2020
@carlescufi
Copy link
Member

@nvlsianpu will you address this in time for 2.3?

@nvlsianpu
Copy link
Collaborator

@carlescufi Yes, fix is almost ready: zephyrproject-rtos/mcumgr#22

nvlsianpu added a commit to nvlsianpu/zephyr that referenced this issue May 15, 2020
fixes zephyrproject-rtos#24706

Fixed issue of possible try to erase non page aligned
size of flash while serving image write command.
The new version has this bug fixed.

Signed-off-by: Andrzej Puzdrowski <andrzej.puzdrowski@nordicsemi.no>
carlescufi pushed a commit that referenced this issue May 15, 2020
fixes #24706

Fixed issue of possible try to erase non page aligned
size of flash while serving image write command.
The new version has this bug fixed.

Signed-off-by: Andrzej Puzdrowski <andrzej.puzdrowski@nordicsemi.no>
krip-tip pushed a commit to krip-tip/zephyr-local that referenced this issue May 30, 2020
fixes zephyrproject-rtos#24706

Fixed issue of possible try to erase non page aligned
size of flash while serving image write command.
The new version has this bug fixed.

Signed-off-by: Andrzej Puzdrowski <andrzej.puzdrowski@nordicsemi.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Flash area: mcumgr bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants