Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use external Flash chip for storage #1547

Open
1 of 20 tasks
boricj opened this issue May 10, 2020 · 6 comments
Open
1 of 20 tasks

Use external Flash chip for storage #1547

boricj opened this issue May 10, 2020 · 6 comments

Comments

@boricj
Copy link
Contributor

boricj commented May 10, 2020

Problem you'd like to fix

Currently, epsilon uses a 32 KiB buffer in RAM as a storage area for Python scripts and Poincare objects (variables, sequences, functions...) with an simple ad-hoc file-system. Its structure is a sequence of (size, filename.extension, data) records with two magic uint32_t values delimiting the beginning and the end of the buffer. Of particular note, the workshop will directly read and write this buffer for transferring Python scripts and as such needs to understand the file-system format to work.

This file-system as it currently exists has several shortcomings:

  • It has a capacity of only 32 KiB.
  • Records are limited to 64 KiB in size due to the datatype used (uint16_t).
  • Records are byte-aligned but contains POD structures that require stricter alignment, requiring tons of Emscripten-specific glue to work around.
  • The file system layout makes no accommodation for Flash storage constraints (sector erasing to all ones, writing to zeroes) as the code assumes it can memmove() things around without issues.
  • Python scripts are prefixed with a single byte denoting their auto-importation status, which makes them binary files and could potentially complicate third-party uses of these.
  • The buffer uses permanently 1/8th of the MCU's total RAM that could be used for other purposes.

Describe the solution you'd like

The N0110 model incorporates an external 8 MiB Flash chip that is currently used for storing the firmware and the exam mode status. It is not used to its fullest potential as nearly 7 MiB of it is unused. Since it is unlikely that the firmware will expand to consume all that space anytime soon, it is proposed to move storage into it.

A new file system is required to make use of it, of which these are the requirements:

  • Accommodate flash storage constraints (4 KiB block erase to ones, bits can then be set to zeroes, wear is a concern).
  • Alignment to at least 4 bytes for aligned memory access.
  • Greater than 64 KiB file size support.
  • Contiguous storage of file data.

Having a new file system with a capacity of several megabytes instead of 32 KiB allows storing things that are currently impossible, such as images, screenshots, rich text documents, large datasets and so on. This in turn allows new usages for end-users (background images for graphs, document viewer, blitting sprites and tilesets in Python...) and also for third-party forks.

Describe alternatives you've considered

The N0100 model has only 1 MiB of Flash available with less than 120 KiB free at the moment. Furthermore, most sectors are 128 KiB long, with only four 16 KiB (the first one containing the reset vector and thus has to be erased with caution, the second one containing the exam mode status) and one 64 KiB sectors. It is doubtful storage can be successfully migrated to Flash in this model, so a decision is required for it (keep old file system, use new file system in existing RAM buffer, wait until N0100 support is discontinued due to stuffed Flash...).

Additional context

This is an improvement request originating from third-party forks. The external apps community support currently uses a tar file located at 0x90200000 to store data files and programs. This is an ad-hoc solution unsatisfactory on multiple levels and it'd be really nice if this could be integrated to share the native epsilon file system altogether. However, this cannot be done without cooperation with upstream as the workshop needs to understand the file system format to transfer scripts. Therefore, it is preferable that a solution satisfying all involved parties be reached.

Tasks identified

  • Get NumWorks team on-board with the idea (since this is the whole point of this issue)
  • Clean up Ion::Storage API and usages
    • Remove inline edition of Python scripts
  • Define new file system format
  • Define new, third-party-friendly Python script storage format
  • Implement new file system format
  • Make use of increased storage space (to be split into separate issues)
    • Images
      • Pick an uncompressed file format to support (BMP, TGA, PBM/PAM...)
      • Add functions to kandinsky module for blitting/saving pictures
      • Add ability to screenshot
    • Python
      • Implement file object for I/O
      • Implement os Python module for directory manipulation
      • Add Python bindings to new kandinsky functions
    • Function/Sequences/Regression app
      • Add ability to put background images in graphs
    • Storage app
      • Create app for managing storage
    • ...
@debrouxl
Copy link
Contributor

Lack of a persistent filesystem has always been a limitation of Epsilon. Fixing this limitation has always been a good idea... and in the wake of TI removing access to native code on the TI-eZ80 series by an OS upgrade on 2020/05/20, with users fuming over the move, a persistent filesystem for Epsilon has become an even better idea of even higher desirability for users of NumWorks calculators :)

The low free Flash space of the N0100 is indeed an issue, we can't expect users to solder an additional Flash chip on N0100 calculators, though it's not too hard to do it. The memory layout of the N0100 OS could be shuffled to be able to use that 64 KB sector for filesystem storage purposes... but then the Flash memory would become unable to store improved OS versions even sooner, which is bad for users.

For over two decades, users of TI-Z80, TI-68k and TI-eZ80 calculators have put up with variables whose maximum size is 64 KB minus several control bytes. On the TI-68k series: in Flash memory, AMS 2.xx and 3.xx use variable headers of 2 bytes for a bit field containing the status of the logical block in Flash memory (active, to be garbage collected later, etc.) + 8 bytes for the folder name + 8 bytes for the variable name.
uint32_t-sized variables would indeed be a nice improvement over that, since it would enable variables whose size is exactly 64 KB and thereby yield speed boosts in e.g. emulators (whose usage is usually not school-oriented, I know), or even larger variables which remove the need for splitting data across files. The flip side of the coin is a more difficult garbage collection of larger variables.

For multiple reasons, USB DFU isn't the best protocol for dealing with Flash-based filesystems.
Needless to say, NumWorks' competitors have provided the ability to transfer individual files whose size is precise to a single byte for over two decades, mostly using proprietary protocols (TI: DBUS, CARS, NavNet, NNSE; Casio: e.g. "p7", I don't know whether that's an official name) which support functionality beyond file transfer, e.g. but not limited to remote screenshots / configuration / keypresses and other control / status query. Newer Casio models use standard USB MSD for transferring files, which is great, but doesn't offer the additional functionality in a single protocol...

@boricj
Copy link
Contributor Author

boricj commented May 23, 2020

At this point, the N0100 only has 150 KiB of Flash unused and its structure is really heterogeneous (4x16 KiB, 1x64 KiB and 7x128 KiB sectors). I'm not going to bother migrating its storage to Flash because quite frankly it's not worth the headache.

I do not think NumWorks will support another USB mechanism to exchange data because DFU is doing the job just fine for their needs. There are no compelling reasons for them to change it. While the community could add Media Transfer Protocol to epsilon, that's a lot of work and I doubt it will happen anytime soon.

Right now I'm assessing the state of storage in epsilon and there are several things to clean up before attempting this. Among other things, Python script edition is done directly inside the storage buffer and that will obviously not work with Flash.

@Ecco
Copy link
Contributor

Ecco commented May 25, 2020

Hi @boricj ! Thanks for starting this discussion!

First of all, this is something we've been wanting to do for a while. It's a great feature and it would be a nice addition to Epsilon.

That being said, it's a feature targeted at power users that very few people have actually asked us to add — you might be the first, actually. As a result, we've kept pushing it further down our roadmap, and to be very honest it's still not one of our top priorities at the moment.

That being said, we'd love to add this at some point, and there are definitely a few decisions to be made. Off the top of my head:

  • We'll need to find a relevant FS. I think either littlefs or FAT could make sense. Possibly others. The upside with FAT is that it'd then become possible to use USB Mass Storage in the real world.
  • We need to find out what happens upon reset. Using the Flash memory as a read-only memory has the huge upside of ensuring a consistent state after reset. Maybe we could reformat the Flash on reset?

Last but not least, as you've noticed, we're indeed very often using the fact that "files" (well, records) are in RAM, and we edit them in place. Changing this behavior would probably require a non-trivial amount of work. Nothing impossible of course, but that assumption is made virtually everywhere records are used.

@boricj
Copy link
Contributor Author

boricj commented May 26, 2020

@Ecco I'm concerned about using an off-the-shelf file system that does not fill the requirements I've laid out:

  • Code throughout epsilon expects records to have their data accessible through a contiguous byte array with value(). Most file systems only provide read()/write() calls, which would take a lot of refactoring to make it work with that paradigm.
  • Most of epsilon only access said contiguous byte array read-only. So far, the only place I've found where it is directly written to without using setValue() is the code app during script edition.
  • The code app can accommodate an edition buffer (of up to 7 KiB of RAM at no cost to the heap) to solve the in-place edition issue. This will be a net gain when taking into account that the 32 KiB of RAM currently assigned to storage will not be needed anymore, even assuming we'll keep a 4 KiB buffer around for compaction. I don't expect users to seriously intend to create or edit a Python script on-calc bigger than a dozen of kilobytes anyway.
  • There are good reasons to insist on contiguous byte array storage. It allows direct access to the data without having to allocate and copy a RAM buffer. It will be very useful for external apps in forks since it allows for execute-in-place, which is impossible if the app is sliced on 4 KiB boundaries.
  • USB MSD is likely not to be a very good fit for the NumWorks calculator. We don't have the RAM to synthesize a file system known to a computer in the megabyte range, so we'd have to directly use FAT or UDF in Flash, which severely restricts the design and raises concerns about wear leveling. If this feature is needed, I expect USB MTP to be a much better option since that decouples the underlying storage mechanism from the computer.
  • Littlefs is designed with power loss resiliency in mind. Since the MCU is powered by a battery, this is not a failure mode we need to be robust against. There's still the odd chance of a crash or a user pressing the reset button while writing to the record system at exactly the wrong time, but I'm doubtful this is a real concern in our case.

It's too early to decide just yet, but unless someone can find an existing file system with the requirements I've identified, I believe that writing an ad-hoc record system to be the least risky option. I'm not going to redesign the entire record subsystem of epsilon from scratch and deal with the snowballing consequences just to reuse a file system off the shelf.

That being said, it's a feature targeted at power users that very few people have actually asked us to add — you might be the first, actually.

I disagree with reducing the use cases to only power users. It's like saying 32K is more memory than anyone will ever need on a calculator. I've listed a bunch of stuff that will be possible when storage is measured in megabytes and not two dozen of kilobytes:

Having a new file system with a capacity of several megabytes instead of 32 KiB allows storing things that are currently impossible, such as images, screenshots, rich text documents, large datasets and so on. This in turn allows new usages for end-users (background images for graphs, document viewer, blitting sprites and tilesets in Python...) and also for third-party forks.

I specifically recall analyzing a video of a thrown ball during a high-school assignment. I used software on a computer to get the coordinates of the ball across time, then performed regression in Excel to find a linear model on the x axis and a quadratic model on the y axis. All of that could conceivably be done on a NumWorks calculator without the need to book a computer room, if it had the storage for a dozen of video frames or even just a single chronophotograph (I assume the data is on the calculator, but it's nothing a smartphone, tablet or portable computer can't handle).

It's not about having a bigger storage just to store more Python scripts. It's about having a bigger storage to enable new possibilities for users that they wouldn't even think of otherwise, because the current limitations of the calculator inhibit that kind of thinking outside the box.

In any case, I believe that the Get NumWorks team on-board with the idea item on the to-do list can be checked off 👍

@debrouxl
Copy link
Contributor

+1 for USB MTP for its ability to decouple the underlying storage mechanism from the communicated filenames. I heard about a working implementation of MTP for the TI-eZ80 series, which has its own flat filesystem (no folders) based on contiguous files stored in sections of either RAM or Flash memory (the section of Flash memory which can store user variables is officially known as "archive memory").

@boricj
Copy link
Contributor Author

boricj commented Jun 5, 2020

So I've given it some more thoughts:

Storage-wise, we should use something like a log-structured sequence of records:

struct RecordHeader {
  char name[60];
  uint32_t dataSize;
  char data[dataSize]; // Padded to 4 byte alignment.
};

Given Flash constraints, deleting a record would be done by writing a null character at name[0] and creating a new record would be done by appending it at the end. When full, compaction with a statically allocated 4 KiB buffer would be required. To prevent degenerate behavior and excessive Flash wear when nearly full, the "maximum" advertised capacity to the user should be smaller than the real capacity.

Since scanning this structure linearly could take a long time when near capacity, a statically-sized and sorted array of record headers pointers (effectively their names) would be required for fast access and iteration. This proposal is mainly because such a structure is easily manipulated with C standard library bsearch()/qsort() functions available from OpenBSD and provides O(log n) lookups. This directory would be held in RAM and not be used/touched by the workshop, since it can be reconstructed through a scan.

An interesting new use-case I've found would be to store the snapshots of inactive apps in Flash. This should substantially decrease RAM usage since it would allow for extending the current system of run-time overlays for apps to their snapshots as well. To prevent out-of-memory allocation problems, part of the "maximum" capacity margin would be used for app snapshots. It does not need to be implemented in the short-term, but it'll be done at some point when RAM shortage or expansion of the MicroPython heap will become an issue. If anything, this highlights the fact that the N0100 should not be a concern for the design (and that non-compatible work, if any, should probably be postponed until the N0100 is filled to the brim and cannot support new major releases anymore).

While I've made some minor cleanups so far, retro-fitting the existing epsilon record code with read-only direct buffer access is proving to be a lot more harder than I've expected since I did not know how pervasive read-write direct buffer access was across the code-base. I expect that a community-led effort would yield a rather big diff in this area and often bitrot against internal NumWorks developments...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants