Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32: Simplify task.c to use handles as callbacks #3087

Closed
wants to merge 2 commits into from

Conversation

jpeletier
Copy link
Contributor

@jpeletier jpeletier commented May 3, 2020

  • This PR is for the dev-esp32 branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

This PR simplifies task.c code to avoid having to register "nodemcu tasks". These nodemcu tasks are actually a collection of registered event callbacks (rather than actually tasks) that can be posted to via a handle.

Instead of registering a callback and obtaining a handle (which is actually an index to a dynamic array of function pointers), this refactor uses the callback pointer directly, so it is not actually necessary to register and maintain this array.

In oder to make this a step by step refactor and make it easier to understand, the interface of tasks.c is left intact. Further changes will clean up code using tasks.c

This refactor results in using a little bit less of memory, no dynamic allocation and better performance since no pointer lookups are necessary when dispatching events.

@jpeletier jpeletier changed the title Simplify task.c to use handles as callbacks ESP32: Simplify task.c to use handles as callbacks May 3, 2020
@TerryE
Copy link
Collaborator

TerryE commented May 4, 2020

@jpeletier Javier, this one is an absolute no-no, IMO. Why? because our roadmap is to converge our two ESP variants back to a common codebase, at least as far as common components such as the Lua runtime and core NodeMCU add-ons. What this PR attempts to fundamentally diverge the two variants further. This should not be slipped in as a "simplfy" PR because this makes achieving our roadmap goals a lot harder.

If you feel that there is a strong case for changing the task interface, then raise this as an issue and we can have this debate across both the ESP8266 and ESP32 platforms.

@TerryE TerryE requested review from TerryE, devsaurus, jmattsson and nwf May 4, 2020 11:39
@jpeletier
Copy link
Contributor Author

@TerryE This PR does not change the interface in any way. As you can see, changes happen only in task.c and task.h, everything compiles the same. Code is backward compatible.

The change is done so we don't have to maintain the task_func function pointer array. The tasks interface does not even have a method to unregister.

With my solution the task handle is the pointer, so there is no need to register "tasks". I quote "tasks" because they don't look like tasks to me but look more like event handlers anyway.

This optimization can be backported to ESP8266, again without changing the interface. I can do another PR to the ESP8266 branch so these go in parallel, thus keeping codebases similar.

Simplify task.c to use handles as callbacks
@TerryE
Copy link
Collaborator

TerryE commented May 4, 2020

Javier, My comment still applies. If we make this sort of change to the esp32 branch then we should also do it to the esp8266 one as well. This belongs in an issue and we should only move to PR after we have consensus on the pros, cons and best solution.

PS: if you forced push, then I can't do inline review. OK, sometimes github is just too much of a dog to do other than a forced one, but do non-forced ones when you can please.

@jmattsson
Copy link
Member

I'm not sure I was ever fully across the rationale behind the implementation details of the NodeMCU task interface, so I'll defer to Terry on those technical bits. I carried across the esp8266 implementation as closely as I could, for better or worse, when I did the initial esp32 port. Terry is definitely correct in that we should avoid diverging the code bases further at this point. If the same change can be made on the 8266 side, I guess this would be a worthwhile change? Especially if we can factor out code that's genuinely the same on both platforms. I don't see this impacting the public API, so as long as we're not shooting ourselves in the foot with diverging code bases, I'm ok with it :)

@TerryE
Copy link
Collaborator

TerryE commented May 7, 2020

Javier, here are my considered comments.

The post and task functions (on the 8266 at least) are in boot ROM (at 0x40000e24 and 0x40000dd0 resp). IIRC from inspection of the BOOT ROM code, the task dispatcher maintains a set of (?) 32 task priority queues each of which has a dispatcher CB, a pointer to a vector of os_event_t records, a max and currently used count, plus a 32 bit priority pending mask work. The ROM dispatch uses a nsau instruction to find the first runnable priority then calls that dispatcher with the Q[priority][0] os_event_t record. The os_event_t vector is contiguous and so the dispatcher unshifts it to pop the task.

There are 3 of the 32 priorities available for user tasks, and task_init_handler() within the platform interface registers these three dispatchers which all use platform_task_dispatch().

The reason that I used an ascending allocator was partly for precedent reasons (all of the SDK examples of use used statically allocated task numbers 0..N) and partly for ease of debugging: having an explicit registration process allows you to track what tasks have been registered and watch their use through the debugger. The algo simple to use and has proved very effective. I copied this meme without looking at alternative such as just using an anonymous pointer to a block.

The amount of RAM used by this process is in practice tiny as most firmware builds will register maybe 4-6 task CBs, that is 16 or 32 bytes for the vector. The registration is a small one-time event at initialisation and so its costs can be ignored. The extra call overhead of extra indirection is a few H/W instructions, but these are indexing a RAM array and so this cost is sub uSec.

So my overall feeling is that there really isn't a compelling reason to change this code as there is no material performance or RAM benefit. Yes, this does mean that the library developer doesn't need to register task CBs at initialisation, and can do this lazily. However, is this really a benefit as you lose the ability to track how many CBs are registered and are in use?

@jpeletier
Copy link
Contributor Author

jpeletier commented May 10, 2020

Thank you @jmattsson and @TerryE for the detailed answer.

After reading your comment and reviewing the ESP8266 branch, I still think this move simplifies the code, as this PR in ESP32 removes a net of 42 lines of code and it arguably easier to understand, getting rid of the whole task handle moniker masking/unmasking/checking as well, which also saves some execution time and whose purpose wasn't obvious at first sight.

However, is this really a benefit as you lose the ability to track how many CBs are registered and are in use?

What code currently needs to track how many CBs are in use? Do we need this?

I proposed this PR as I ran across task.c when was writing the Sonar module (#3090). I was looking for the correct way to run code within the context of Lua. Thus, I was expecting to find only one public function with a signature along these lines:

run_code_in_lua_context(priority, parameter, function pointer)

Yet, I found I had to register it beforehand at module initialization time to obtain a handle (task_get_id), store it in RAM (a value that will never change...), and then use that handle to call task_post.

For me, this is not only slight RAM and execution time savings, it is also making it more intuitive for the developer: When you want to invoke some function in Lua context, it should suffice to invoke a function like the one I am proposing, stating the name of the target function directly.

If you would allow me, I'd like to give it a go and implement this idea for ESP8266 as well.

On another note, I would like to also change the signature of task_callback_t to include Lua state L as the first parameter, since all the callback implementations need to inmediately call lua_getstate() as the first step anyway (I checked, all of them do). We might as well do that as part of task_dispatch() and save the developer that step. This applies to both ESP32 and ESP8266 variants.

@TerryE
Copy link
Collaborator

TerryE commented May 11, 2020

@jpeletier Javier, we actively seek to engage other potential contributors, hence my wish to give you detailed feedback. However, when it comes to considering any change to the Lua and NodeMCU core code, we operate a compelling reason test. My assessment of your "I'd like to give it a go", is that this falls far short of a compelling reason, so by all means try your change on your own local development environment, but we wouldn't accept this change into NodeMCU.

Why? We have ~25 uses of the task registration in our ESP8266 core code, drivers and modules and comparable number in the ESP32 branch, plus a larger number of task posts. We have an unknown number of private non-shared modules that might also use this API. Your suggested API modification is a breaking change, so all of these uses would also need to be changed to support it, and these would all need retesting. This is all to shave off less than 1 µSec runtime per task post, and to remove one line of code from your luaopen_sonar() source.

Yes, we could recode the dispatcher logic to support your proposed format as an optional alternative to the existing API, so that we could leave existing code unmodified, but this would increase the overall runtime per task post. This just doesn't pass the sniff test. Sorry.

@jpeletier
Copy link
Contributor Author

Why? We have ~25 uses of the task registration in our ESP8266 core code, drivers and modules and comparable number in the ESP32 branch, plus a larger number of task posts. We have an unknown number of private non-shared modules that might also use this API. Your suggested API modification is a breaking change, so all of these uses would also need to be changed to support it, and these would all need retesting. This is all to shave off less than 1 µSec runtime per task post, and to remove one line of code from your luaopen_sonar() source.

The code in this PR is totally backward compatible and does not break the current interface. Can you please take a look and see it does not break anything? Code compiles without issue.

My main point is code cleanup and simplification. I find the current implementation convoluted, namely all the business with:

#define TASK_HANDLE_MONIKER 0x68680000
#define TASK_HANDLE_MASK    0xFFF80000
#define TASK_HANDLE_UNMASK  (~TASK_HANDLE_MASK)
#define TASK_HANDLE_ALLOCATION_BRICK 4   // must be a power of 2

plus dynamic memory to handle the function array, and so on... It is hard to figure out how this works for anyone new to this code. That's why I want to change it to something simpler with less code. As a bonus, it runs a tiny bit faster, but I agree that is not the main point.

Maybe I created confusion about suggesting an interface change. I apologize. We can leave that for a different discussion. This PR keeps the current interface.

we wouldn't accept this change into NodeMCU.

Who is "we"? I'd rather have others comment on this, as I understand everyone should speak for themselves in order to get more points of view and make this project better together. I can't accept such a strong "no" when the answers I am given don't address the actual code this PR is proposing to change. Interface is left intact.

My assessment of your "I'd like to give it a go", is that this falls far short of a compelling reason

With this I meant, implementing the change for ESP8266 for you to take a look. Here it is #3101. Let's forget about the discussion above, just look at code.

@TerryE
Copy link
Collaborator

TerryE commented May 11, 2020

"We" in this case is the set of allocated reviewers; 2 so far have abstained and the other has deferred to me. However, I will carefully weight their input.

Having gone through your Interface again, I agree that the architectural changes are encapsulated behind the interface, and I apologise for having missed this on my last review. However, I come back to my main point: where is the compelling reason for the change? What this boils down to is a sub µSec runtime improvement once per task post, which is in the quantisation noise.

I agree that using masks and monikers might seem like overkill, but the historic reason for this was that if you go back three years, then the whole ISR to Lua VM interface was a mess. There was no coherency in the implementation and the main challenge was to prevent Lua applications randomly cratering, so using an additional CB list and runtime moniker checking greatly helped debugging all of these task interfaces.

There are all sorts of tricks and wrinkles in the Lua kernel that are extremely difficult to grasp so what is different about this one? Have a look at some of the examples of the changes that I have introduced over this last 4 years or so to see what I mean about only changing the kernel VM and NodeMCU libraries for "compelling reason", then see the additions that I have done:

  • The addition of driver and language support for task posting
  • Space-efficient full error diagnostics
  • LFS which increases the available ROM+RAM by factors (up to 10× on ESP8266)
  • Macros to remove ~95% of the 'unaligned exception overhead' of accessing byte fields in flash based constant records.
  • Rewrite of the ROTAble implementation to give ~50× speedup on ROTable access
  • Lua 5.3 port and support
  • Decrease LValue size (the basic quantum of variable storage) from 16 to 12 bytes on Lua 5.1 and 16 to 8 bytes on Lua 5.3
  • Piping of stdin, stderr and stdout to allow resilient input/output redirection and overrun-free bulk past into stdin.
  • Robust error handling and reporting on user panics in CBs.

If you can think of something that adds a true additional feature to the Lua core, of say a 5% on overall performance then great, lets talk about that, but at the moment I am getting irritated and distracted from this mainstream work by debating how to save ~0.2 µSec runtime improvement on the per task post overhead.

@jpeletier
Copy link
Contributor Author

Hi @TerryE,

I was never in here to trigger a lengthy debate. I get all your points and I understand that this code is left in here for historical reasons; that you have done a ton of work to get NodeMCU to where it is today and I have a great deal of respect for you having achieved that together with the others.

I feel like I have touched a "forbidden" file and from then on this went into defensive mode, derailing from the main points: I never centered this PR around improving performance but around code simplification. (I did make the mistake of mixing in a proposed interface change, though, which created confusion). Yet, I get lengthy responses; this scares me off.

My only argument left standing is that the change I am proposing makes the code more readable and easier to maintain and evolve. The fact is that this PR together with #3101 removes more than 80 lines of code while keeping the interface. The current scheme is hard to read, and I understand it would have been useful 3 years ago. But hard to read code means only you can read it, meaning less potential help from everyone else. We can remove the wrinkles, one at a time. This code does not have to be difficult.

Technical discussion aside, having a bit of contribution in here should also help offload you or others from a bit of responsibility, since now one more person feels responsible, knows about how that works and will be able to make changes to fix or enhance in the future.

I ask you to reconsider.
Thanks,
Javier

@TerryE
Copy link
Collaborator

TerryE commented May 12, 2020

Let's just leave this one pending for now. We've got a lot other high priority stuff like the anti-panic error handling that I want to merge into dev as soon as Marcel has done the next cut to master.

@jmattsson
Copy link
Member

I will weigh in here and say that simplification is a strong argument in itself, so there's that. Timing could of course be discussed.

On that topic, are you guys now waiting on me to resume the merging of the esp8266 and esp32 branches? I haven't had a chance to keep up with NodeMCU in way too long, but I'm thinking possibly after the end of my current client engagement I might get some time again.

@TerryE
Copy link
Collaborator

TerryE commented May 12, 2020

@jmattsson

I will weigh in here and say that simplification is a strong argument in itself, so there's that.

IMO Javier's simplification is really to remove the runtime error check on task handles. The implicit thrust of this argument is that runtime error checking is redundant and so its removal is a "good" simplification.

Of course it is so long as programs and programmers are perfect, but in everyone's experience they aren't and mistakes are made. Runtime error checking policy is trade-off: too much can be an overhead; just right means that errors result in stable and diagnosable failures; too little means that we just get immediate (and often seemly random) and extremely hard to diagnose CPU exception reboots.

The current implementation uses a handle for the task id which includes a fingerprint and an ID within small range by priority so in practice only valid task ids can get dispatched. And yes, this adds few dozen source lines, and this field picking and testing adds 10-20 extra machine code instructions to each task dispatch. Javier's suggestion is that we can simply the implementation by removing this check, and in doing so avoid this tiny runtime overhead.

But let's take another example where we could apply this identical logic and remove a similar check: that a CB Userdata reference points to the expected userdata. Such a check is similarly redundant if the CB is always called with the correct UData (and since the CB is accessed through the UData metatable, then it is rather difficult to engineer a situation where it isn't called the right Udata). We have a couple of hundred such uses of luaL_checkudata() across the ESP8266 and ESP32 source base, they are called a lot more often and since these involve a keyed table access to the registry, they are a lot more expensive in runtime terms. So why not just

#define luaL_checkudata(l, n, s) lua_touserdata(l, n)

If there are no errors then this will maybe speed up execution by something noticeable, say 1%. But if there are errors then the firnware will reboot with extremely hard to diagnose hardware exceptions.

My belief that the Lua implementers and the Lua community have established a policy and set of practices that make a good balance of performance and stability. We should follow this wisdom and these practices. We certainly shouldn't introduce random bottom up changes to the Lua and platform core code which abandon them piecemeal for what is in practice an unmeasurable performance change.

I don't want to unilaterally close this, but having to give reasoned responses also means that I do need to take my limited time to go over this same ground time and again. I would much sooner point Javier at a bunch of development activities that would tangibly improve the firmware if we could resource them.

@stale
Copy link

stale bot commented Jun 16, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 16, 2021
@stale stale bot closed this Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants