Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New flash writing method with offset/memory/size alignment handling #7514

Merged
merged 25 commits into from
Oct 15, 2020

Conversation

drzony
Copy link
Contributor

@drzony drzony commented Aug 7, 2020

Fixes issues raised in #7491 (comment)

@yumkam
Copy link

yumkam commented Aug 7, 2020

I meant different issue: over-read of data (via *ptr), not over-read of flash.
(and I suspect spi_flash_read/spi_flash_write don't support (size % 4) != 0. Something like
this
(WARNING: COMPLETELY UNTESTED)

@drzony
Copy link
Contributor Author

drzony commented Aug 7, 2020

flash_write_puya_buf size is checked against bytesNow, which is enough, since it's aligned to flash sector size:

  // Good alternative for buffer size is: SPI_FLASH_SEC_SIZE (= 4k)
  // Always use a multiple of flash page size (256 bytes)
  #define PUYA_BUFFER_SIZE 256

As for spi_flash_read and spi_flash_write - they support (size % 4) != 0

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is true that the existing code may read up to 3 bytes past the end of the pointer, this is absolutely safe because no pointer the core can access is anywhere near 3 bytes of the end-of-RAM so there is 0 chance if it causing a HW exception. The heap is ~30K before the end of RAM, and the main app stack pointer will be well beyond it, too.

It's not worth the extra code size and risk of opening up this block to avoid this.

@drzony
Copy link
Contributor Author

drzony commented Aug 8, 2020

@earlephilhower My commit does not change the over-read, it only changes the over-write, which in my opinion is a valid point. I see that you requested changes to my request, what should I change?

@earlephilhower
Copy link
Collaborator

Thanks for the explanation. I was going thru the original issue and thought the purpose was to not read past end-of-data.

This section of code has been very much trouble for us, so honestly what I'd need is a failing and passing valid test case on real XMC hardware with a couple folks' confirmations. XMC just has some strange behavior which has caused something like 3 dot-releases so far, and stuff that looks like it should be fine often isn't when it meets that abomination of a flash chip.

@drzony
Copy link
Contributor Author

drzony commented Aug 8, 2020

I have PUYA chip in my Sonoff S20, so I'm verifying each time before creating a pull request.
A minimal test case for me is just a gzipped OTA upgrade (via Arduino OTA). Before my original patch from #7491 when gzip did not have 4 byte alignment the update failed, since the last (size % 4) bytes were not modified in temporary buffer. Unfortunately my first patch wrote additional (4 - (size % 4)) bytes to the flash and that is what this one fixes. I'm still modifying (size + 3) & ~3 in temporary buffer (which is the fix for gzipped OTA), but only reading/writing size. In fact this patch only reverts a part of #7491

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the SDK I think what is in there is correct. Byte-wide spi_read/_write doesn't seem legal.

cores/esp8266/Esp.cpp Show resolved Hide resolved
cores/esp8266/Esp.cpp Show resolved Hide resolved
@earlephilhower
Copy link
Collaborator

@devyte, if you're still relaxing, can you eyeball this? Am I missing something?

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code reads bytesAligned, then ANDs bytesAligned, then writes bytesAligned.
I think this is wrong, because the AND loop accesses out-of-bounds, and the result could change the bytes beyond the last one.
This PR reads bytesNow, then ANDs bytesAligned, then writes bytesNow.
I think this is wrong, because of the reason above, and in addition because the spi ops are supposed to be multiple-of-4.
Consider:
Read bytesAligned, then AND bytesNow, then write bytesAligned.

The reading will read into the buffer with potentially more bytes than needed (alignment), then will AND only up until the last valid byte leaving those beyond the last one unchanged, then write back, including the last unchanged bytes.
Of course, handling the last bytes likely need to be done as a special case.

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

@devyte
AND is not out of bounds since the flash_write_puya_buf is 4 byte aligned and there is a check for bytesNow > PUYA_BUFFER_SIZE
@devyte @earlephilhower
If the flash reads must be 4-byte aligned, then it should be fixed in EspClass::flashWrite and EspClass::flashRead, this means adding something similar to what is done currently for PUYA for the generic case i.e.:

if (!aligned) {
  readLast4BytesFromFlash();
  andReadWithData();
  writeLast4BytesToFlash();
}

@yumkam
Copy link

yumkam commented Aug 10, 2020

See my suggested change linked above, it does exactly this (and also does not over-read data).

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

@yumkam It does not handle the generic case

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state of this PR seems to be technically correct for both PUYA and normal flash chips.
The conclusion of a gitter discussion was:

  • The underlying spi_flash_write() and spi_flash_read() calls are documented by Espressif to require size to be a multiple-of-4
  • The calls seem to work with size not a multiple-of-4, but it is suspected that the bytes above the last misaligned one will be written either as null or with garbage
  • The correctness for allowing size to be not a multiple-of-4 comes at the price of an extra read (unavoidable) and an extra write (may be avoidable)
  • An extra write means that the flash life is reduced

Internal discussions are ongoing, this PR should be considered on-hold for now

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

My additional explanation about flash writes:
Extra write will not reduce life of flash:

  1. SPI flashes can usually write in 1-byte increments, on ESP it's 4 probably due to speed (i.e. DIO/QIO instructions)
  2. Flash life is usually measured in erase cycles
  3. On SPI we can write to all bytes on page no matter the order, but only once before erase is needed
  4. The additional write will not erase flash, so it will not wear it more
  5. Example of such approach can be seen in Linux UBIFS driver, which keeps accounting of written "sub-pages" and fills the whole page before erasing it
  6. The only problem here is when someone tries to write on unaligned offsets i.e. write 3 bytes at offset 0, then write 1 byte at offset 2. If spi_flash_write somehow aligns the write, which seems to be the case, then the second write will fail as the bytes 0-3 were already written. However the same will happen no matter what we do, unless we include the type of accounting that UBIFS does.

@TD-er
Copy link
Contributor

TD-er commented Aug 10, 2020

@drzony
I don't agree with the statement about the writes.
The original PUYA chips for which the patch was made, is doing something in the background.
So what we're seeing is the data is exactly written to the flash as it is sent to it.
For flash memory, this is not what you would expect, as it should only change a 1 into a 0 and not the other way around.
But that's exactly what the PUYA chips do. If there is 0xAA stored in the flash and you write 0x55 to it, you would expect the cell to contain 0x00, but instead it holds 0x55.

So unless they have come up with some revolutionary new type of memory, they must perform an erase somewhere.
It could be the chip itself does buffer a page and does perform the erase/write cycle in the background, or it is using some page mapping.
But there must be extra erase/write cycles on those memory chips to get the result we're seeing.

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

@TD-er
I can see the problem, but I don't have an idea on how to test whether read(4), write(4), read(4), write(4) is any different from read(8), write(8), because this is what seems to be the worry here.

@TD-er
Copy link
Contributor

TD-er commented Aug 10, 2020

More like this:
write 3 bytes, start at 4N + 0 => read(4), write(4)
write 3 bytes, start at 4N + 3 => read(4), write(4), read(4), write(4) (or read(8), write(8))

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

Ok, so I have confirmed that spi_flash_write self-aligns, this code:

uint32_t flash_offset = 0xa0000;
uint32_t wr_buf1[2]   = {0x55555555, 0x55555555};
uint32_t wr_buf2[2]   = {0xAAAAAAAA, 0xAAAAAAAA};
uint32_t rd_buf1[2]   = {0x00000000, 0x00000000};

spi_flash_erase_sector(flash_offset / 0x1000);
spi_flash_write(flash_offset, wr_buf1, 3);
spi_flash_read(flash_offset, rd_buf1, 4);
DEBUG("Read from flash + 0 %08x\n", rd_buf1[0]);
flash_read(flash_offset + 4, rd_buf1, 4);
DEBUG("Read from flash + 4 %08x\n", rd_buf1[0]);
spi_flash_write(flash_offset + 4, wr_buf1, 4);
spi_flash_read(flash_offset + 8, rd_buf1, 4);
DEBUG("Read from flash + 8 %08x\n", rd_buf1[0]);
spi_flash_write(flash_offset + 8, wr_buf2, 4);
spi_flash_read(flash_offset + 4, rd_buf1, 4);
DEBUG("Read from flash + 4 %08x\n", rd_buf1[0]);
spi_flash_read(flash_offset + 8, rd_buf1, 4);
DEBUG("Read from flash + 8 %08x\n", rd_buf1[0]);

Produces (on esp12e with "normal" flash and on S20 with PUYA flash):

Read from flash + 0 55555555
Read from flash + 4 ffffffff
Read from flash + 8 ffffffff
Read from flash + 4 55555555
Read from flash + 8 aaaaaaaa

So always 4 bytes are written

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

Next test:

uint32_t flash_offset = 0xa0000;
uint32_t wr_buf1[2]   = {0x55555555, 0x55555555};
uint32_t wr_buf2[2]   = {0xAAAAAAAA, 0xAAAAAAAA};
uint32_t rd_buf1[2]   = {0x00000000, 0x00000000};

spi_flash_erase_sector(flash_offset / 0x1000);
spi_flash_write(flash_offset, wr_buf1, 4);
spi_flash_write(flash_offset, wr_buf2, 4);
spi_flash_read(flash_offset, rd_buf1, 4);
DEBUG("Read from flash + 0 %08x\n", rd_buf1[0]);

Produces:
On "normal" flash:

Read from flash + 0 00000000

On PUYA flash:

Read from flash + 0 aaaa2aaa

So completely messed up.

@TD-er
Copy link
Contributor

TD-er commented Aug 10, 2020

Hmm that 2 is strange in the PUYA output.
If it were only a HEX values, then I could understand it, but this 2 I can't explain.

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

And the last test for today:

uint32_t flash_offset = 0xa0000;
uint32_t wr_buf1[2]   = {0xDEADBEEF, 0xDEADBEEF};
uint32_t rd_buf1[2] = {0x00000000, 0x00000000};
uint32_t rd_buf2[2] = {0x00000000, 0x00000000};
spi_flash_erase_sector(flash_offset / 0x1000);
spi_flash_read(flash_offset, rd_buf1, 4);
spi_flash_write(flash_offset, wr_buf1, 4);
spi_flash_read(flash_offset + 4, rd_buf1 + 1, 4);
spi_flash_write(flash_offset + 4, wr_buf1 + 1, 4);
spi_flash_read(flash_offset, rd_buf2, 8);
DEBUG("Read from flash + 0 %08x %08x %08x %08x\n", rd_buf1[0], rd_buf1[1], rd_buf2[0], rd_buf2[1]);
memset(rd_buf1, 0, 8);
memset(rd_buf2, 0, 8);
spi_flash_erase_sector(flash_offset / 0x1000);
spi_flash_read(flash_offset, rd_buf1, 8);
spi_flash_write(flash_offset, wr_buf1, 8);
spi_flash_read(flash_offset, rd_buf2, 8);
DEBUG("Read from flash + 0 %08x %08x %08x %08x\n", rd_buf1[0], rd_buf1[1], rd_buf2[0], rd_buf2[1]);

Produces

Read from flash + 0 ffffffff ffffffff deadbeef deadbeef
Read from flash + 0 ffffffff ffffffff deadbeef deadbeef

On both flashes, so if there is some erasing happening, then it's not affecting other data on flash.
As for the magic 2 in read, I also have no idea what is happening other than some weird buffering.

@drzony
Copy link
Contributor Author

drzony commented Aug 10, 2020

@TD-er One thing keeps bugging me, what is the reason for the read/write combo on PUYA flashes?
As I've shown above if we keep to the standard erase/write-once cycle everything works as expected.
It seems that the problem was (and still is) with unaligned writes, which are self-aligned by spi_flash_write, which may cause double write on some bytes (which is always bad since we can only set 0s).
In fact I've met several flashes that didn't react well to writing to the same address several times, since it's out of spec.

@drzony
Copy link
Contributor Author

drzony commented Oct 11, 2020

Nice work, thanks!

Because this touches a lot of things in the core and blob, can we please get a device test for it with all the new corner cases enumerated and verified? Something that goes thru all possible aligned/unaligned source/dest/lens? That would give great confidence that we're not breaking some corner case only hit by a FS or something very rarely.

Code from my comment above covers that, I'll put that into a test

@drzony
Copy link
Contributor Author

drzony commented Oct 11, 2020

@devyte @earlephilhower Can any of you run those tests on the device? I'm using Platformio for development on Windows, so setting up a testing environment will take me some time.

@earlephilhower
Copy link
Collaborator

@drzony I'll give this PR a run thru my system and report back. Thanks for adding tests!

@earlephilhower
Copy link
Collaborator

@drzony, a minor patch is required to get it to build and run:

diff --git a/tests/device/test_spi_flash/test_spi_flash.ino b/tests/device/test_spi_flash/test_spi_flash.ino
index a0cc1c583..33746d10b 100644
--- a/tests/device/test_spi_flash/test_spi_flash.ino
+++ b/tests/device/test_spi_flash/test_spi_flash.ino
@@ -1,6 +1,8 @@
 #include <BSTest.h>
 #include <ESP8266WiFi.h>
 
+BS_ENV_DECLARE();
+
 void preinit() {
     // (no C++ in function)
     // disable wifi
@@ -59,7 +61,7 @@ bool testFlash(uint32_t start_offset, uint8_t data_offset, size_t amount)
         dumpBuffer((uint8_t *)read_buffer, 8);
         return false;
     }
-    Serial.printf("Write took %lu us\n", DL::Timing::diff(start, micros()));
+    Serial.printf("Write took %lu us\n", micros() - start);
     return true;
 }
 

Once done, I'm seeing a failure on the "Last bytes of page" test:

No SPIFFS to upload
Uploading binary
Running tests
running test "|+o|+m|+s|0p|"
test "|+o|+m|+s|0p|" passed
running test "|+o|+m|+s|1p|"
test "|+o|+m|+s|1p|" passed
running test "|+o|+m|-s|0p|"
test "|+o|+m|-s|0p|" passed
running test "|+o|+m|-s|2p|"
test "|+o|+m|-s|2p|" passed
running test "|+o|-m|+s|0|"
test "|+o|-m|+s|0|" passed
running test "|+o|-m|+s|1p|"
test "|+o|-m|+s|1p|" passed
running test "|+o|-m|-s|0p|"
test "|+o|-m|-s|0p|" passed
running test "|+o|-m|-s|2p|"
test "|+o|-m|-s|2p|" passed
running test "|-o|+m|+s|0p|"
test "|-o|+m|+s|0p|" passed
running test "|-o|+m|+s|1p|"
test "|-o|+m|+s|1p|" passed
running test "|-o|+m|-s|0p|"
test "|-o|+m|-s|0p|" passed
running test "|-o|+m|-s|1p|"
test "|-o|+m|-s|1p|" passed
running test "|-o|-m|+s|0p|"
test "|-o|-m|+s|0p|" passed
running test "|-o|-m|+s|1p|"
test "|-o|-m|+s|1p|" passed
running test "|-o|-m|-s|0p|"
test "|-o|-m|-s|0p|" passed
running test "|-o|-m|-s|1p|"
test "|-o|-m|-s|1p|" passed
running test "Last bytes of page"
test "Last bytes of page" failed
running test "Unaligned page cross only"
test "Unaligned page cross only" passed

@earlephilhower
Copy link
Collaborator

Verbose output:

running test "Last bytes of page"
test output was:
17
>>>>>bs_test_start file="test_spi_flash.ino" line=153 name="Last bytes of page" desc="[spi_flash]"
---------------------------------------------------
Testing 1 bytes @ 000a00ff + 0
Write fail
>>>>>bs_test_check_failure line=155
---------------------------------------------------
Testing 1 bytes @ 000a00ff + 1
Write fail
>>>>>bs_test_check_failure line=156
---------------------------------------------------
Testing 2 bytes @ 000a00fe + 0
Write fail
>>>>>bs_test_check_failure line=157
---------------------------------------------------
Testing 2 bytes @ 000a00fe + 1
Write fail
>>>>>bs_test_check_failure line=158
---------------------------------------------------
Testing 3 bytes @ 000a00fd + 0
Write fail
>>>>>bs_test_check_failure line=159
---------------------------------------------------
Testing 3 bytes @ 000a00fd + 1
Write fail
>>>>>bs_test_check_failure line=160
>>>>>bs_test_end line=0 result=0 checks=6 failed_checks=6

test "Last bytes of page" failed

@drzony
Copy link
Contributor Author

drzony commented Oct 12, 2020

@earlephilhower Thanks for the assist :) Everything should be fixed now

@drzony
Copy link
Contributor Author

drzony commented Oct 12, 2020

Still testing on PUYA flash, do not merge yet

@drzony
Copy link
Contributor Author

drzony commented Oct 12, 2020

I've managed to run those tests on PUYA device and they passed, so the PR is ready to merge.

@drzony drzony changed the title Do not write more data than requested on PUYA flashes New flash writing method with offset/memory/size alignment handling Oct 12, 2020
@earlephilhower
Copy link
Collaborator

Latest version passes in device tests for me, too:

--------------------------------
Running test 'test_spi_flash/test_spi_flash.ino' of 26 tests
Compiling test_spi_flash.ino
Executable segment sizes:
IROM   : 250596          - code in flash         (default or ICACHE_FLASH_ATTR) 
IRAM   : 27072   / 32768 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 
DATA   : 1328  )         - initialized variables (global, static) in RAM/HEAP 
RODATA : 1880  ) / 81920 - constants             (global, static) in RAM/HEAP 
BSS    : 25264 )         - zeroed variables      (global, static) in RAM/HEAP 
Sketch uses 280876 bytes (26%) of program storage space. Maximum is 1044464 bytes.
Global variables use 28472 bytes (34%) of dynamic memory, leaving 53448 bytes for local variables. Maximum is 81920 bytes.
No SPIFFS to upload
Uploading binary
Running tests
running test "|+o|+m|+s|0p|"
test "|+o|+m|+s|0p|" passed
running test "|+o|+m|+s|1p|"
test "|+o|+m|+s|1p|" passed
running test "|+o|+m|-s|0p|"
test "|+o|+m|-s|0p|" passed
running test "|+o|+m|-s|2p|"
test "|+o|+m|-s|2p|" passed
running test "|+o|-m|+s|0|"
test "|+o|-m|+s|0|" passed
running test "|+o|-m|+s|1p|"
test "|+o|-m|+s|1p|" passed
running test "|+o|-m|-s|0p|"
test "|+o|-m|-s|0p|" passed
running test "|+o|-m|-s|2p|"
test "|+o|-m|-s|2p|" passed
running test "|-o|+m|+s|0p|"
test "|-o|+m|+s|0p|" passed
running test "|-o|+m|+s|1p|"
test "|-o|+m|+s|1p|" passed
running test "|-o|+m|-s|0p|"
test "|-o|+m|-s|0p|" passed
running test "|-o|+m|-s|1p|"
test "|-o|+m|-s|1p|" passed
running test "|-o|-m|+s|0p|"
test "|-o|-m|+s|0p|" passed
running test "|-o|-m|+s|1p|"
test "|-o|-m|+s|1p|" passed
running test "|-o|-m|-s|0p|"
test "|-o|-m|-s|0p|" passed
running test "|-o|-m|-s|1p|"
test "|-o|-m|-s|1p|" passed
running test "Last bytes of page"
test "Last bytes of page" passed
running test "Unaligned page cross only"
test "Unaligned page cross only" passed

@devyte devyte added this to the 3.0.0 milestone Oct 14, 2020
@earlephilhower earlephilhower merged commit 79ea883 into esp8266:master Oct 15, 2020
davisonja added a commit to davisonja/Arduino that referenced this pull request Oct 16, 2020
* master: (37 commits)
  BREAKING: Change return EEPROM.end() to bool (esp8266#7630)
  BREAKING: Change return type of channel() (esp8266#7656)
  BREAKING: Change return type of RSSI() (esp8266#7657)
  Add Copyright notice to Schedule.h (esp8266#7653)
  MDNS MultiInterface (esp8266#7636)
  BREAKING: Add Wrong Password wifi status case (esp8266#7652)
  New flash writing method with offset/memory/size alignment handling (esp8266#7514)
  Fix error when debug enabled but no port chosen (esp8266#7648)
  LEAmDNSv2: change a macro name to be independant from LEAmDNS1 (esp8266#7640)
  Allow test framework to use cores/esp8266/Arduino.h directly (esp8266#7377)
  Update OTA HTTP Server Header Information (esp8266#7633)
  Add missing sntp_init/sntp_stop (esp8266#7628)
  Use direct member initialization instead of ctr initialisation (esp8266#7558)
  Prevent rewriting Updater_Signing.h if content unchanged (esp8266#7627)
  Add WiFi Multi to readme.rst
  Remove stray axtls refs, deprecated compat funcs (esp8266#7626)
  Pull deprecated axtls link (esp8266#7624)
  Redesign ESP8266WiFiMulti.[cpp|h]
  Update README.md (esp8266#7623)
  Eliminate code duplication by template for printNumber(...)/printFloat(...).
  ...
@drzony drzony deleted the puya_flash_fixes branch October 16, 2020 15:00
Jason2866 added a commit to Jason2866/Arduino that referenced this pull request Oct 19, 2020
Jason2866 added a commit to Jason2866/Arduino that referenced this pull request Oct 19, 2020
Jason2866 added a commit to Jason2866/Arduino that referenced this pull request Oct 19, 2020
@drzony drzony mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants