Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not trying to make the SDK into a real OS. #999

Closed
TerryE opened this issue Feb 1, 2016 · 12 comments
Closed

Not trying to make the SDK into a real OS. #999

TerryE opened this issue Feb 1, 2016 · 12 comments

Comments

@TerryE
Copy link
Collaborator

TerryE commented Feb 1, 2016

I am a bit concerned about the issues raised in #993. Here we have a inexperienced (Lua firmware) developer trying to make nodeMCU Lua look more like a procedural implementation of the Lua language. It's not; it is layered on top of the non-OS SDK and basically has to operate within the constraints that the SDK imposes.

One of these constraints is that a task can only initiate one async espconn function, such as a send or a close. In 0.9x this wasn't enforced, and so applications could queue up multiple requests within a task activation and so would often crater as a result of resource exhaustion, and so Epressiff deprecated this.

I am really concerned about any attempts of making nodeMCU behave as a really process-orientated OS, by adding layers of resource queuing etc. on top of the SDK. IMO, all this will do is add some partial obfuscation of the true underlying nature of the SDK. As an implementation, 993 is also flawed. IMO, this type of modification to the nodeMCU core will destabilise it, but I don't want to close this, without first airing the principle amongst the other committers.

@jmattsson
Copy link
Member

Yes. No. Maybe. Software is hard, there's no way around that. I can totally appreciate people wanting to be able to send what they want to send, and have it sent. Sticking the head in the sand with regard to buffering is rarely a good thing though, especially on a resource constrained device, and it's not like string concatenation in Lua is hard. Besides, the SDK does have buffering support for several-kilobytes worth of a single-send() payload.

We have a lot of work to do just to get a whole bunch of espconn_disconnect() calls out of a recv callback into a proper task context. I'm not going near than until we have Terry's nice task abstraction stuff in though.

I would at very least hold off with introducing more espconn stuff until we have gotten our house in order, unless the submission strictly adheres to the SDK constraints/directives.

@devyte
Copy link

devyte commented Feb 4, 2016

Hi,

I currently own 24 ESP-12s, which I'm developing to automate my home in different ways. I'm not a firmware collaborator (yet), but I have wrestled quite a bit on the lua side (please see my nodemcu-platform repo). I'm also a hardcore R&D software engineer by profession (C/C++/embedded TCL) with a long trajectory and even a few software patents, and an electronics engineer by university studies. I mention all this only to explain that, while new to this repo, I'm not exactly a noobsauce on either hw or sw sides.

Before I explain my experience and make some suggestions on how to look at this issue, I have some questions:

  • What is the goal of this community effort? Is it to make a good and easy-to-use lua api built on top of the SDK, or is to make a thin 1:1 Lua API over the SDK, meant to maintain the SDK behaviour?
  • Who is the target user of this firmware? Is it a Lua developer who will try to make a Lua application on top of the nodemcu lua api, or is it someone who knows about the SDK details and will add to the nodemcu firmware?

@TerryE
Copy link
Collaborator Author

TerryE commented Feb 4, 2016

@devyte, this all really outside the scope of this issue but I'll answer your two main points:

  • Community Goals The active contributors here are community members (as opposed to NodeMCU employees). Like you we are experienced embedded systems developers who want to use the ESP8266 / Lua platform for IoT embedded development. When the original NodeMCU contributors redirected their priorities, the wider community took the lead on continuing to move the project forward, with the then lead developers agreement and full cooperation. The ESP architecture is an incredible piece of H/W but it and the supporting S/W is closed source and poorly documented. We believe that the SDK was originally developed as the abstraction layer for the Espressif AT-style application, but was later extended to offer more general application hosting. Our Lua firmware is one such application. Our immediate aim is to make a stable and performant Lua implementation which offers full access to the chips features through the SDK. To do this, we work within the SDK's constraints.
  • Contributors can broadly be divided into two groups: those such as myself and Johny Mattsson who are mainly focused on the Lua and NodeMCU core; and a wider range of contributors who are adding library suppport for a range of H/W devices.
  • Target Users. These are developers who want to develop real-life IoT applications based on the ESP8266 chipset, but who prefer using a filesystem-based high level language (Lua) rather than working in C. Most such developers need to know little about the SDK other than I explain in my FAQ.

We've had to jump through a few hoops to ensure that we have sufficient RAM and Flash resources left for the developer, and leaving this margin continues to be one of our greatest challenges.

Note that since the initial development of the firmware, Espressif have engaged in a separate community-lead initiative to develop and RTOS-based framework as an alternative to the SDK. However, its footprint is larger and on the ESP8266 at least, this makes its use with a Lua firmware variant impractical.

@nickandrew
Copy link
Contributor

Reliability is my highest priority. There's little point in using the ESP8266 with NodeMCU if a properly written program which follows all the rules also crashes at random. That said, I also want it to be easy to program, because that will encourage more applications and more DIY electronics. So the rules should be as simple as possible consistent with reliability.

If NodeMCU can successfully abstract the tricky parts of the SDK, it will be easier to program. It's got to be balanced against resource use though, and it has got to be reliable.

@devyte
Copy link

devyte commented Feb 18, 2016

Thank you for the explanation @TerryE . I've been tracking this project for quite a while, and I've done my homework with regard to reading up on the background. I'm very aware of the community efforts, and it is precisely this that has inspired me to step forward with my own contributions. I do hope to eventually contribute to the nodemcu firmware core.

This issue #999 is to discuss whether the platform should behave more like a true OS. My answer to that is a resounding no, because it's not a true OS, but rather a (glorified) embedded system, and the development perspective is therefore different from developing for a true OS. Anyone developing for this platform, whether on the C or Lua sides, must keep aware of this.

However, @nickandrew has more or less hit the nail on the head as to the reason behind #993. For that particular issue, I would suggest implementing a different interface, or at least one that is more predictable.

Why? Most users want to do multiple conn:send()s in one way or another before returning execution control. Of course, this currently doesn't work, so users get frustrated, and an avalanche of related issues get opened up, which nobody wants.
Current workarounds? All users would need to either concatenate a whole bunch of strings before sending the results, or adopt a complex mechanism on the Lua side for queuing payloads.
Concatenation in Lua makes for lousy performance, and can't always be used in a straightforward manner, because the Lua code could be spread out among a whole bunch of functions and lua files. Just think about serving dynamic webpages, and how many concatenations would be required for the entire html.
So that leaves a complex queuing mechanism on the Lua side.

Let's think about this for a moment. A queued send interface on the C side, or a queued send interface on the Lua side.
Which would be easier to use? Which would require less resources? Which one would have better performance? Which would be harder to implement?

I don't think there's doubt about which would be easier to use. An implementation on the C side would allow Lua users to eliminate any queuing mechanism on their side, thereby reducing Lua code, which in turn reduces mem footprint, which is the second point: resources. So, is there any chance of a C implementation requiring more resources than a Lua implementation? I would expect the answer to that is no, but then again I haven't yet delved sufficiently into the ESP firmware or hw details, so please correct me if I'm wrong.
Then there is performance. Queuing and sending mostly means saving strings, concatenating, and sending over the socket. Lua is a thin layer of the C functions, but it is still a layer with an interpreter in the way. It may not be much, but I would expect that a C implementation would have slightly better performance.
And finally, difficulty to implement, which should be read: implement correctly. The implementation must be correct, and one meaning of that is that it must not destabilize the platform in any way, otherwise what's the point.
On the Lua side, I've seen 3 different approaches: queue in a Lua fifo and then unqueue with onSent() ( @TerryE 's example in #993 is a variation of this, I also implemented my own at some point, then dropped it), a Lua coroutine with send()-yield() pairs, and a coroutine with a buffered connection*.
The Lua fifo is not hard to use, but it doesn't fix the problem, because it can still destabilize the system if the fifo fills up and eats up all the heap. The other two suck with regard to usability in one way or another, mostly because they are hard to integrate into an application.
As for performance, the buffered connection seems to me to be the best, but it is also the most complex to integrate into an application.
On the C side, I can't see Helsy22's code anymore, but I suspect it was a straightup fifo, only within C. A simple fifo-based queuing on the C side would just crater the platform just like a straightup fifo on the Lua side, if enough payloads are queued without releasing control so that the data actually gets sent, because heap could run out.

So, my point of view is:
yes, there would be huge benefits for a queued send() interface implemented on the C side. However, I would think that it should be something like the buffered connection, only hidden on the C side, completely decoupled from lualand, and transparent to the Lua user.
no, I don't think Hellsy22's solution is the correct one (assuming I'm right that it was a fifo approach)
no, I don't think the correct solution would be easy to implement.
yes, I think it would be well worth it. Usage would be straightforward, no hoops to jump through on Lua side, no running out of heap, just conn:send() conn:send() conn:send()... best of all worlds.

* By buffered connection I mean the mechanism used here, where a buffered connection is used in conjunction with a coroutine to accumulate payloads up to a size threshold (less than one MTU), then concatenate and send. In this context, send means actually send the data and wait till done, as opposed to just calling sk:send(). As a result,* memory doesn't fill up* no matter how many times in a row you call buffconn:send() before returning from Lua. That's why the httpserver can serve html files of any size, even when the html comes from lua scripts generating the html on the fly.

@jmattsson
Copy link
Member

Just tossing something into the hat here: what if sock.send() accepted an array? Would that be better/worse/ugly/creater-y?

@devyte
Copy link

devyte commented Feb 19, 2016

@jmattsson I don't think it would really help much. The instinctive usage is to call send() several times before returning control, and expect the payloads to get handled and sent under the hood. Arrays as arguments to send() would still accumulate and use up the heap before really sending anything. All newer users would attempt this, and then scratch their heads wondering why it doesn't work as expected. I had to investigate the reasons myself, and was looking for a solution to the downside of a lua-side queue similar to what @TerryE proposed, when I came across httpserver and the mechanism there.
I don't expect a solution within C-world anytime soon (I'd love to tackle the problem, but I consider it sensitive enough to not take it up as my first contribution project), so in parallel to this discussion, I'm discussing a pure-lua implementation of a generalized buffered threaded connection here. I think it's the next best thing we can get in a reasonable time frame, and once it comes to fruition, perhaps the idea can be implemented in C-world. It wouldn't be the same model, of course, no coroutines in C-world, but I think there should be other ways to accomplish the same thing.

@jmattsson
Copy link
Member

In the interim, would it make sense to change the way the :send() reports errors, and rather than just return false, have it raise an exception saying "attempt to call send() before previous send() completed"? It would make it more of a pain to actually handle errors, but would at least point out the issue in an obvious way...

@devyte
Copy link

devyte commented Feb 19, 2016

@jmattsson Absolutely.
I don't see how it would make handling errors more of a pain, though. Do you mean having to wrap with pcall or something for a graceful recovery from the exception? If so, that probably shouldn't be done. The current code base requires that a single send() be active at any time before the next onSent() callback. If more than one send() is attempted, it's really a programming error, and the developer needs to fix his code. It's like an assert(): there is an assumption that must be met for correct behavior, and the exception yells at you when it's not.
Doc should probably be updated to mention this.

@TerryE
Copy link
Collaborator Author

TerryE commented Feb 19, 2016

The error handling in the netmodule is a mess. There are lots of redundant error checks, but also where espconn system calls can return an error this isn't even checked for and the execution path assumes success.

@marcelstoer
Copy link
Member

Terry, do you want to keep this open? The discussion turned (again) very quickly into pros and cons of a queuing conn:send(). Going in circles rather than addressing the broader issue you raised.

I believe it's hard to disagree with your

Not trying to make the SDK into a real OS.

statement. In specific cases though people would argue wether a particular implementation would make NodeMCU more like a "real OS", thereby violating this policy, or not.

@TerryE
Copy link
Collaborator Author

TerryE commented Feb 26, 2016

Good point, I think that #1080 captures the main essence of how we address most of these point plus other open issues such as #937.

@TerryE TerryE closed this as completed Feb 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants