Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMC Flash chip output drive support #6559

Closed
ChocolateFrogsNuts opened this issue Sep 27, 2019 · 20 comments
Closed

XMC Flash chip output drive support #6559

ChocolateFrogsNuts opened this issue Sep 27, 2019 · 20 comments

Comments

@ChocolateFrogsNuts
Copy link
Contributor

Basic Infos

  • [ X] This issue complies with the issue POLICY doc.
  • [X ] I have read the documentation at readthedocs and the issue is not addressed there.
  • [X ] I have tested that the issue is present in current master branch (aka latest git).
  • [X ] I have searched the issue tracker for a similar issue.
  • [X ] If there is a stack dump, I have decoded it.
  • [X ] I have filled out all fields below.

Platform

  • Hardware: other - XMC Flash chips attached to any ESP8266
  • Core Version: [-]
  • Development Env: [Arduino IDE]
  • Operating System: [N/A]

Settings in IDE

  • Module: [Wemos D1 mini]
  • Flash Mode: [qio]
  • Flash Size: [4MB/1MB]
  • lwip Variant: [v2 Lower Memory]
  • Reset Method: [ck|nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz|160MHz]
  • Upload Using: [SERIAL]
  • Upload Speed: [460800] (serial upload only)

Problem Description

XMC Flash chips used on a number of boards (most notably some WEMOS D1 mini) have a power saving feature that defaults to 75% drive on their outputs. This can result in unstable operation at 40MHz and above in some circumstances.

While #6552 allows lowering the flash frequency to achieve stable operation, it is possible to run the chips at the full 80MHz with 100% drive by setting bits 5 and 6 of SR3 with
esptool.py write_flash_status --bytes 3 0x600000
Unfortunately this only lasts for the current power cycle.

Following on from the discussion started at the end of #6366, we now attempt to develop some code that will set SR3:5,6 in XMC flash chips when they are detected...

@ChocolateFrogsNuts
Copy link
Contributor Author

Ok, got code that sets SR3 at https://github.com/ChocolateFrogsNuts/ESP-DebugTools/blob/master/DebugTools/flash.ino
but for some reason it's not able to read SR3 yet... once that's sorted I can organize a PR with some neater/stripped down code suitable for a startup check.

@ChocolateFrogsNuts
Copy link
Contributor Author

Update: got it reading the flash status registers properly.
We now have a working function for sending generic SPI commands, and a short function that uses it to enable 100% output drive if an XMC flash chip is detected at >26 MHz.

I'll work on a PR over the next few days. 392 bytes of the code need to be IRAM_ATTR (although I can get that down to 270 by making certain assumptions), so this probably needs to be something that can be enabled/disabled easily.

@earlephilhower
Copy link
Collaborator

I wonder if we can do some tricks for functions that are only needed a single time to fix them in the ICACHE and not permanently in IRAM.

Could we disable all interrupts, then read the function code into a dummy variable, to cause it to load into IRAM? Then call the fcn, which since no no-IRQ code could have caused it to spill out of the cache. Some question as to the # of ways the cache uses, etc. But seems better than wasting such a large chunk for a one-time event.

Alternatively, overlays could be used and you could explicitly flush out existing IRAM, write your fcn in, call it, and then write the old code back. It's hackish, but if its done at chip init time (preInit callback, see the examples) it may be very low risk.

@devyte
Copy link
Collaborator

devyte commented Oct 4, 2019

@earlephilhower whatever solution works regarding those ideas, it would be worthwhile to search other cases to apply it to.
I particularly like the idea of manually preloading a function into cache. To not make it so hackish, an api could be implemented to ease usage.

@ChocolateFrogsNuts
Copy link
Contributor Author

I've been looking at the compile stats we get now and thinking the same things - there seems to be a lot of ICACHE_RAM_ATTR code, and surely a lot of it only runs rarely or even once.

One thing I have noticed is that functions seem to be cached a page at a time as needed.
If there was a way to force pre-caching an entire block of code or function while it was executing, then release it... perhaps with overlays or by placing each function in it's own linker section.
If we could have linker sections loaded into IRAM/released on demand it would be perfect.
In this case we don't need speed, we just can't allow a page fault during the "critical" part.

@mhightower83
Copy link
Contributor

@ChocolateFrogsNuts, How big is a page? Assuming the whole function fits into a single page, it sounds like it would be loaded before execution, is there a problem? If it's more than one page, what if you break it into parts. Such that you can call the parts for preloading, then call them for real. Something like this:

void part1(int something, bool preload) {
  if (preload)
    return;
 // ...
  part2(something, false);
}
void part2(int something, bool preload) {
  if (preload)
    return;
 // ...
}
part1(0,true);
part2(0,true);
part1(42,false);

@ChocolateFrogsNuts
Copy link
Contributor Author

ChocolateFrogsNuts commented Oct 5, 2019

I am assuming flashchip->page_size (256 bytes), or a multiple of it. It's not very big.
If spi0_command() is not IRAM_ATTR, or if there is a call to a non-iram func in it (eg Serial.print) it triggers a page fault, invalidating the SPI registers that have already been set - usually resulting in a WDT reset.
The problem is it needs to get all the way from setting the first register to copying the result without anything else accessing the SPI controller. I don't know if SPI1 can be successfully used rather than SPI0 to access the flash chip, but that would still have a concurrent access problem...

Hmm, I wonder if I could reduce the critical size enough by calculating the register values in one func, then just loading them into the controller and reading the result in a separate func...

EDIT: the func would need to be page-aligned to give that a chance of working

@ChocolateFrogsNuts
Copy link
Contributor Author

ChocolateFrogsNuts commented Oct 5, 2019

Split the func in 2, getting the critical section down to 160 bytes, but it still has to be IRAM_ATTR.

Anyone know how to align the func? "#pragma align(256)" is ignored because the compiler doesn't understand it (a warning is printed).

EDIT: __attribute__((aligned(256))) seems to compile, but still fails without IRAM_ATTR.

@ChocolateFrogsNuts
Copy link
Contributor Author

I have a suspicion the page size is 64 bytes - the maximum size of a transfer by the SPI hardware controllers.

@ChocolateFrogsNuts
Copy link
Contributor Author

For future reference, executing code in RAM causes Exception 2 - InstructionFetchError
so we can't copy a func into normal RAM while it runs.
Also xtensa supports an IPFL (Instruction PreFetch and Lock) to temporarily lock code into the cache which would have been perfect... but the ESP8266 does not have that option enabled :(
The cache page (line) size also appears to be 16 bytes, so that pretty much rules out any attempt at preloading.

@earlephilhower
Copy link
Collaborator

@ChocolateFrogsNuts how about (over)writing 32b wide into IRAM and executing that? That's less appealing, since you need to copy back the real code when done, but if done in the preinit (see WiFiShutdown.ino) should be relatively safe.

@ChocolateFrogsNuts
Copy link
Contributor Author

So far I get exception 3 - LoadStoreError just trying to read the iram.

@devyte
Copy link
Collaborator

devyte commented Oct 8, 2019

What happens if you run this configuration code as part of eboot? I believe that any IRAM used gets cleared when booting the application.

@d-a-v d-a-v mentioned this issue Oct 8, 2019
@ChocolateFrogsNuts
Copy link
Contributor Author

ahh, oops - my bad...
I tried iram access by doing a memory dump... inadvertently accessing it as uint8 - which was never going to work as iram can only be accessed as aligned uint32
I will see what I can do with that....
I'll take a look at doing it in eboot if this doesn't work out.

@ChocolateFrogsNuts
Copy link
Contributor Author

Ok, I'm an idiot... this seems to be working:

  // precache and run _spi0_command()
  uint32_t *p = (uint32_t*)_spi0_command;
  volatile uint32_t x;
  for (int i=0; i<64; i++) x=*p++;
  (void)x;
  _spi0_command(spi0c,flags,spi0u1,spi0u2,data,data_words*4,read_words*4);

The cache is apparently 32kb, so pulling in 256 bytes an not just the 212 bytes of the function leaves a bit of room in case the function changes size without forcing too much out of the cache unnecessarily.

@ChocolateFrogsNuts
Copy link
Contributor Author

ChocolateFrogsNuts commented Oct 9, 2019

awesome... moved the precache code into it's own function and made it so you can specify an address to pre-load from, or NULL for the current Program Counter.
Now I just put
precache(NULL, 256)
as the first line of the function, and it doesn't need to be IRAM_ATTR at all :-)
Updated code pushed to the repository mentioned above.

Now I just need to decide where to slot this code in - the XMC specific stuff is quite short (16 lines) and probably isn't worth it's own file. The generic stuff, which may be useful for other SPI devices is longer and should go somewhere specific to generic low-level SPI, but I can't see a suitable file for that...
Suggestions anyone?

oh, and the precache code probably deserves to go in some generic spot too...

@mhightower83
Copy link
Contributor

esptool.py write_flash_status --bytes 3 0x600000
Unfortunately this only lasts for the current power cycle.

I am not sure if your part has a volatile/non-volatile status register combination.
Have you tried the --non-volatile switch?

eg.

esptool.py write_flash_status --bytes 3 0x600000 --non-volatile

I just tried it for a Winbond W25Q128FVSG. For me, the new driver strength bits seem to stick across power cycles.

Side note: Not sure what your XMC part number is; however, our drive strength tables are different. 0x60 is the default and lowest drive setting for the Winbond.

@ChocolateFrogsNuts
Copy link
Contributor Author

According to the datasheet from XMC some parts of the status registers have volatile and non-volatile versions, others don't. The drive strength only exists as a volatile register.
Testing backed this up.
Apparently Winbond do this a little differently, including the meanings of the bits...
XMC part XM25QH32B

@devyte
Copy link
Collaborator

devyte commented Apr 23, 2020

PR #6725 is merged, but there is still a detail pending in eboot, as mentioned in that PR.
@ChocolateFrogsNuts I'm keeping this open to track that last pending. There is no rush, resolution is up to you.

@devyte
Copy link
Collaborator

devyte commented Jun 6, 2020

This is already addressed. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants