-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Concurrency design #247
Changes from 2 commits
c5cbe67
941b2c1
ec7bdb4
05f4ba9
5e0830f
72f995f
95a5840
ddb99b0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# Proposal: Concurrency on Nash | ||
|
||
There has been some discussion on how to provide concurrency to nash. | ||
There is a [discussion here](https://github.com/NeowayLabs/nash/issues/224) | ||
on how concurrency could be added as a set of built-in functions. | ||
|
||
As we progressed discussing it seemed desirable to have a concurrency | ||
that enforced no sharing between concurrent functions. It eliminates | ||
races and forces all communication to happen explicitly, and the | ||
performance overhead would not be a problem to a high level language | ||
as nash. | ||
|
||
## Lightweight Processes | ||
|
||
This idea is inspired on Erlang concurrency model. Since Nash does | ||
not aspire to do everything that Erlang does (like distributed programming) | ||
so this is not a copy, we just take some things as inspiration. | ||
|
||
Why call this a process ? On the [Erlang docs](http://erlang.org/doc/getting_started/conc_prog.html) | ||
there is a interesting definition of process: | ||
|
||
``` | ||
the term "process" is usually used when the threads of execution share no | ||
data with each other and the term "thread" when they share data in some way. | ||
Threads of execution in Erlang share no data, | ||
that is why they are called processes | ||
``` | ||
|
||
In this context the process word is used to mean a concurrent thread of | ||
execution that does not share any data. The only means of communication | ||
are through message passing. Since these processes are lightweight | ||
creating a lot of them will be cheap (at least must cheaper than | ||
OS processes). | ||
|
||
Instead of using channel instances in this model you send messages | ||
to processes (actor model), it works pretty much like a networking | ||
model using UDP datagrams. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something must be written about queuing of data. Should Every process have a queue of incoming data? What happens if data is being sent to some process but it never reads (never invokes receives)? Is it buffered? discarded? and so on. I think it should be buffered and we must document a maximum buffer size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, now I saw the TODO section :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that writing code that depends on queues being infinite is a good idea, but when we where talking about implementing more higher level languages the idea of a natural number that gets as big as it can get did not sound like a problem =P, but to be fair the queue exausting memory will be easier indeed. For me, or it is always infinite (as erlang) and you should never just send trillions of messages without waiting for some answer. Or on spawn we can pass the queue size. An internal fixed magic chosen values does not seem like a good choice to me x_x. |
||
|
||
The idea is to leverage this as a syntactic construction of the language | ||
to make it as explicit and easy as possible to use. | ||
|
||
This idea introduces 4 new concepts, 3 built-in functions and one | ||
new keyword. | ||
|
||
The keyword **spawn** is used to spawn a function as a new process. | ||
The function **send** is used to send messages to a process. | ||
The function **receive** is used to receive messages from a process. | ||
The function **self** returns the pid of the process calling it. | ||
|
||
An example of a simple ping/pong: | ||
|
||
``` | ||
pid <= spawn fn () { | ||
ping, senderpid <= receive() | ||
echo $ping | ||
send($senderpid, "pong") | ||
}() | ||
|
||
send($pid, "ping", self()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel this is always going to be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I don't. This concurrency model is highly decoupled, I'm not sure if it is a good idea to infer that a child process will always talk with the parent. The parent could actually send the pid of another process that will want the answer. Like starting a job source and N workers, maybe the answers will go to the job source so he knows if more jobs need to be sent. But I feel your feelings that may make the parent->child relationship a little more verbose There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we could go to a more specific model that always impose a parent/child relationship, it would make some things easier...but I was liking the idea of the function spawned having no pre-defined signature for arguments (full flexibility), but I may be biased. @tiago4orion what do you think ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe name this builtin fn send(pid, data) {
return sendto($pid, $data, self())
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea. But I'm rethinking being able to send multiple args and receiving them. On the send part it is ok since there is no timeout for sending, since it will not block. But receiving is a little more complicated. It seems useful to have a timeout when you run a receive, and you will need to be notified if the receive worked or if the timeout has expired, so receiving a dynamic number of args on the return makes it a little awkward to receive the error too (the return arity will always be the send arity + 1). Perhaps if receive was a syntatic construct instead of a function and we had something similar to pattern matching, but only for arity, like (WARNING, heavily improvised syntax =P):
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tiago4orion @vitorarins What you think about the syntatic receive ? If we dont go for some syntatic support I think we need to always send/receive only one value (would not be that bad if we got maps soon). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea but the example is kind of confusing to me.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For now neither make me very happy...but if the idea is good we can search for something that looks nice =D |
||
pong <= receive() | ||
|
||
echo $pong | ||
``` | ||
|
||
TODO: | ||
|
||
* If send is never blocking, what if process queue gets too big ? just go on until memory exhausts ? | ||
* What happens when you send to a invalid pid ? (or a pid of a process that is not running anymore) | ||
* Example on how would fan-out/fan-in look with this idea | ||
|
||
## Extend rfork | ||
|
||
Converging to a no shared state between concurrent functions initiated | ||
the idea of using the current rfork built-in as a means to express | ||
concurrency on Nash. This would already be possible today, the idea | ||
is just to make it even easier, specially the communication between | ||
different concurrent processes. | ||
|
||
This idea enables an even greater amount of isolation between concurrent | ||
processes since rfork enables different namespaces isolation (besides memory), | ||
but it has the obvious fallback of not being very lightweight. | ||
|
||
Since the idea of nash is to write simple scripts this does not seem | ||
to be a problem. If it is on the future we can create lightweight concurrent | ||
processes (green threads) that works orthogonally with rfork. | ||
|
||
The prototype for the new rfork would be something like this: | ||
|
||
```sh | ||
chan <= rfork [ns_param1, ns_param2] (chan) { | ||
//some code | ||
} | ||
``` | ||
|
||
The code on the rfork block does not have access to the | ||
lexical outer scope but it receives as a parameter a channel | ||
instance. | ||
|
||
This channel instance can be used by the forked processes and | ||
by the creator of the process to communicate. We could use built-in functions: | ||
|
||
```sh | ||
chan <= rfork [ns_param1, ns_param2] (chan) { | ||
cwrite($chan, "hi") | ||
} | ||
|
||
a <= cread($chan) | ||
``` | ||
|
||
Or some syntactic extension: | ||
|
||
```sh | ||
chan <= rfork [ns_param1, ns_param2] (chan) { | ||
$chan <- "hi" | ||
} | ||
|
||
a <= <-$chan | ||
``` | ||
|
||
Since this channel is meant only to be used to communicate with | ||
the created process, it will be closed when the process exit: | ||
|
||
```sh | ||
chan <= rfork [ns_param1, ns_param2] (chan) { | ||
} | ||
|
||
# returns empty string when channel is closed | ||
<-$chan | ||
``` | ||
|
||
Fan out and fan in should be pretty trivial: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about the blocking/nonblocking types of channel? If only blocking channels are supported, then some builtin function to 'select' between then will be required. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Select would be great =D. Nothing against buffered channels either...just not thinking about it right now. Actually I'm supposing you are talking about the buffered channels, nonblocking channels do not exist on Go, when the channel is full it will also block, the only way to guarantee that you will never block is by using select + default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What I mean was blocking/non-blocking types of communication, sorry, but buffered channels and/or select are ways to achieve this in Go, but I don't know how much of Go semantics makes sense to export to nash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ow I see. Erlang has only message delivery, like network datagrams, they are non blocking. The problem with non-blocking is that they infer some sort of queue on the receiver (on Erlang it is the process mailbox), because I can just send 10000 trillion messages without blocking. Go makes this queue explicit and will still block the sender...not sure what model is better right now. |
||
|
||
```sh | ||
chan1 <= rfork [ns_param1, ns_param2] (chan) { | ||
} | ||
|
||
chan2 <= rfork [ns_param1, ns_param2] (chan) { | ||
} | ||
|
||
# waiting for both to finish | ||
<-$chan1 | ||
<-$chan2 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that really depends on the implementation (I think).
For example, the code below:
In the code above, the entire environment of the parent interpreter should be copied to every lightweight process? Then maybe it will not be lightweight anymore and we can have a big performance penalty at each process spawn. Note that worker function doesn't use
anotherhugelibrary
but it gets copied anyway.But if the parent environment is not copied, having each process to import their own libraries could be painful:
This way the processes are really lightweight, could bootstrap really fast, but the processes needs to be self-contained and import everything they need every time.
I don't know what's better. Maybe a mixed approach? Copy the stdlib to every process but leave other libraries to explicit import? I dont know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either, I thought about that after I wrote this, my problem is not even with huge dependencies on the sense that this will make things not be lightweight, my problem is lack of isolation and sharing of state. If the same imported module is shared them the module state is also shared and it is a violation of the idea. I'm in doubt for two behaviors:
1 - Automatically reload the modules of the parent, but they will be freshly loaded modules (all module initialization is executed again). Perhaps that is already what you have in mind.
2 - Don't load anything, import again
Perhaps is a lot of cases the code executed concurrenly will not use the dependencies. It will depend a lot on the use case. az cli use cases will be a mess with any model since the login action will affect the user's home directory...but this is pretty much bad coding from microsoft...as usual. On this case the only one who garantees isolation is the rfork approach (we are going to have both anyway since they have pretty different use cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan o the hybrid =P