From c5cbe6741d08e4134e65f591ae31737485298ccf Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Fri, 10 Nov 2017 18:55:59 -0200 Subject: [PATCH 1/7] Add first notes on rfork --- proposal/2-concurrency.md | 83 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 proposal/2-concurrency.md diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md new file mode 100644 index 00000000..f7a6cf9d --- /dev/null +++ b/proposal/2-concurrency.md @@ -0,0 +1,83 @@ +# Proposal: Concurrency on Nash + +There has been some discussion on how to provide concurrency to nash. +There is a [discussion here](https://github.com/NeowayLabs/nash/issues/224) +on how concurrency could be added as a set of built-in functions. + +As we progressed discussing it seemed desirable to have a concurrency +that enforced no sharing between concurrent functions. It eliminates +races and forces all communication to happen explicitly, and the +performance overhead would not be a problem to a high level language +as nash. + +Converging to a no shared state between concurrent functions initiated +the idea of using the current rfork built-in as a means to express +concurrency on Nash. This would already be possible today, the idea +is just to make it even easier, specially the communication between +different concurrent processes. + +This idea enables an even greater amount of isolation between concurrent +processes since rfork enables different namespaces isolation (besides memory), +but it has the obvious fallback of not being very lightweight. + +Since the idea of nash is to write simple scripts this does not seem +to be a problem. If it is on the future we can create lightweight concurrent +processes (green threads) that works orthogonally with rfork. + +The prototype for the new rfork would be something like this: + +```sh +chan <= rfork [ns_param1, ns_param2] (chan) { + //some code +} +``` + +The code on the rfork block does not have access to the +lexical outer scope but it receives as a parameter a channel +instance. + +This channel instance can be used by the forked processes and +by the creator of the process to communicate. We could use built-in functions: + +```sh +chan <= rfork [ns_param1, ns_param2] (chan) { + cwrite($chan, "hi") +} + +a <= cread($chan) +``` + +Or some syntactic extension: + +```sh +chan <= rfork [ns_param1, ns_param2] (chan) { + $chan <- "hi" +} + +a <= <-$chan +``` + +Since this channel is meant only to be used to communicate with +the created process, it will be closed when the process exit: + +```sh +chan <= rfork [ns_param1, ns_param2] (chan) { +} + +# returns empty string when channel is closed +<-$chan +``` + +Fan out and fan in should be pretty trivial: + +```sh +chan1 <= rfork [ns_param1, ns_param2] (chan) { +} + +chan2 <= rfork [ns_param1, ns_param2] (chan) { +} + +# waiting for both to finish +<-$chan1 +<-$chan2 +``` From 941b2c1beb262d84dfa158f2e312ca61973f132a Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Thu, 18 Jan 2018 01:10:58 -0200 Subject: [PATCH 2/7] Add initial draft of concurrency with actor model --- proposal/2-concurrency.md | 60 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md index f7a6cf9d..feca1bf8 100644 --- a/proposal/2-concurrency.md +++ b/proposal/2-concurrency.md @@ -10,6 +10,66 @@ races and forces all communication to happen explicitly, and the performance overhead would not be a problem to a high level language as nash. +## Lightweight Processes + +This idea is inspired on Erlang concurrency model. Since Nash does +not aspire to do everything that Erlang does (like distributed programming) +so this is not a copy, we just take some things as inspiration. + +Why call this a process ? On the [Erlang docs](http://erlang.org/doc/getting_started/conc_prog.html) +there is a interesting definition of process: + +``` +the term "process" is usually used when the threads of execution share no +data with each other and the term "thread" when they share data in some way. +Threads of execution in Erlang share no data, +that is why they are called processes +``` + +In this context the process word is used to mean a concurrent thread of +execution that does not share any data. The only means of communication +are through message passing. Since these processes are lightweight +creating a lot of them will be cheap (at least must cheaper than +OS processes). + +Instead of using channel instances in this model you send messages +to processes (actor model), it works pretty much like a networking +model using UDP datagrams. + +The idea is to leverage this as a syntactic construction of the language +to make it as explicit and easy as possible to use. + +This idea introduces 4 new concepts, 3 built-in functions and one +new keyword. + +The keyword **spawn** is used to spawn a function as a new process. +The function **send** is used to send messages to a process. +The function **receive** is used to receive messages from a process. +The function **self** returns the pid of the process calling it. + +An example of a simple ping/pong: + +``` +pid <= spawn fn () { + ping, senderpid <= receive() + echo $ping + send($senderpid, "pong") +}() + +send($pid, "ping", self()) +pong <= receive() + +echo $pong +``` + +TODO: + +* If send is never blocking, what if process queue gets too big ? just go on until memory exhausts ? +* What happens when you send to a invalid pid ? (or a pid of a process that is not running anymore) +* Example on how would fan-out/fan-in look with this idea + +## Extend rfork + Converging to a no shared state between concurrent functions initiated the idea of using the current rfork built-in as a means to express concurrency on Nash. This would already be possible today, the idea From ec7bdb4b40389bec81653d220d4c445fac430d33 Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Thu, 18 Jan 2018 16:38:29 -0200 Subject: [PATCH 3/7] Add fan-out fan-in example --- proposal/2-concurrency.md | 57 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md index feca1bf8..f7708baa 100644 --- a/proposal/2-concurrency.md +++ b/proposal/2-concurrency.md @@ -62,11 +62,60 @@ pong <= receive() echo $pong ``` -TODO: +Spawned functions can also receive parameters (always deep copies): + +``` +pid <= spawn fn (answerpid) { + send($answerpid, "pong") +}(self()) + +pong <= receive() +echo $pong +``` + +A simple fan-out/fan-in implementation: + +``` +jobs = ("1" "2" "3" "4" "5") + +for job in $jobs { + spawn fn (job, answerpid) { + import io + + io_println("job[%s] done", $job) + send($answerpid, format("result [%s]", $job)) + }($job, self()) +} + +for job in $jobs { + result <= receive() + echo $result +} +``` + +### Error Handling + +Error handling on this concurrency model is very similar to +how we do it on a distributed system. If a remote service fails and +just dies and you are using UDP you will never be informed of it, +the behavior will be to timeout the request and try again (possibly +to another service instance through a load balancer). + + +### TODO + +Spawned functions should have access to imported modules ? +(seems like no, but some usages of this may seem odd) + +If send is never blocking, what if process queue gets too big ? +just go on until memory exhausts ? + +Not sure if passing parameters in spawn will not make things +inconsistent with function calls + +What happens when you send to a invalid pid ? +(or a pid of a process that is not running anymore). -* If send is never blocking, what if process queue gets too big ? just go on until memory exhausts ? -* What happens when you send to a invalid pid ? (or a pid of a process that is not running anymore) -* Example on how would fan-out/fan-in look with this idea ## Extend rfork From 05f4ba9546780289a14cc2f595b6324646af42f5 Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Thu, 18 Jan 2018 16:50:02 -0200 Subject: [PATCH 4/7] Add some ideas on error handling --- proposal/2-concurrency.md | 44 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md index f7708baa..43fc5714 100644 --- a/proposal/2-concurrency.md +++ b/proposal/2-concurrency.md @@ -101,6 +101,47 @@ just dies and you are using UDP you will never be informed of it, the behavior will be to timeout the request and try again (possibly to another service instance through a load balancer). +To implement this idea we can add a timeout to the receive an add +a new parameter, a boolean, indicating if there is a message or if a +timeout has occurred. + +Example: + +``` +msg, ok <= receive(timeout) +if !ok { + echo "oops timeout" +} +``` + +The timeout can be omitted if you wish to just wait forever. + +For send operations we need to add just one boolean return value indicating +if the process pid exists and the message has been delivered: + +``` +if !send($pid, $msg) { + echo "oops message cant be sent" +} +``` + +Since the processes are always local there is no need for a more +detailed error message (the message would always be the same), the +error will always involve a pid that has no owner (the process never +existed or already exited). + +We could add a more specific error message if we decide that +the process message queue can get too big and we start to +drop messages. The error would help to differentiate +from a dead process or a overloaded process. + +An error indicating a overloaded process could help +to implement back pressure logic (try again later). +But if we are sticking with local concurrency only this +may be unnecessary complexity. You can avoid this by +always sending N messages and waiting for N responses +before sending more messages. + ### TODO @@ -113,9 +154,6 @@ just go on until memory exhausts ? Not sure if passing parameters in spawn will not make things inconsistent with function calls -What happens when you send to a invalid pid ? -(or a pid of a process that is not running anymore). - ## Extend rfork From 5e0830fb7e7999948f64c5028f59822bc04a3c9b Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Thu, 18 Jan 2018 19:03:58 -0200 Subject: [PATCH 5/7] Add more TODO --- proposal/2-concurrency.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md index 43fc5714..6d7de2ca 100644 --- a/proposal/2-concurrency.md +++ b/proposal/2-concurrency.md @@ -154,6 +154,9 @@ just go on until memory exhausts ? Not sure if passing parameters in spawn will not make things inconsistent with function calls +What happens when something is written on the stdout of a spawned +process ? redirect to parent shell ? + ## Extend rfork From 95a58401e87f905319b33e97082fd6d8a27e0abf Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Wed, 24 Jan 2018 18:56:57 -0200 Subject: [PATCH 6/7] Add more elaborated example --- proposal/2-concurrency.md | 82 +++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 11 deletions(-) diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md index 6d7de2ca..9230a5e1 100644 --- a/proposal/2-concurrency.md +++ b/proposal/2-concurrency.md @@ -16,7 +16,8 @@ This idea is inspired on Erlang concurrency model. Since Nash does not aspire to do everything that Erlang does (like distributed programming) so this is not a copy, we just take some things as inspiration. -Why call this a process ? On the [Erlang docs](http://erlang.org/doc/getting_started/conc_prog.html) +Why call this a process ? +On the [Erlang docs](http://erlang.org/doc/getting_started/conc_prog.html) there is a interesting definition of process: ``` @@ -73,7 +74,7 @@ pong <= receive() echo $pong ``` -A simple fan-out/fan-in implementation: +A simple fan-out/fan-in implementation (N jobs <-> N processes): ``` jobs = ("1" "2" "3" "4" "5") @@ -93,6 +94,68 @@ for job in $jobs { } ``` +### Advanced Fan-out Fan-in + +Here is an example of a more elaborated fan-out/fan-in. +On this case we have much more jobs to execute than +workers, so it requires more coordination than the previous example. + +For brevity this example does not handle timeouts. + +Lets suppose an script that tries different passwords on a host: + +``` +var passwords_feed <= spawn fn() { + + fn sendpassword(password) { + var worker <= receive() + if !send($worker, $password) { + sendpassword($password) + } + } + + for password in generate_passwords() { + sendpassword($password) + } +} + +fn login(output, passwords_feed, done) { + + for send($passwords_feed, self()) { + var password = receive() + var result <= login "someuser" $password + send($output, $result) + } + + send($done, "done") +} + +fn outputhandler() { + for { + var result = receive() + if $result == "0" { + echo "success" + } + } +} + +var workers = 10 + +var feed <= spawn passwords_feed() +var outputhandler <= spawn outputhandler() + +for i in range(0, $workers) { + spawn login($outputhandler, $feed, self()) +} + +for i in range(0, $workers) { + msg <= receive() + if $msg != "done" { + echo "dafuck ?" + } +} +``` + ### Error Handling Error handling on this concurrency model is very similar to @@ -116,8 +179,9 @@ if !ok { The timeout can be omitted if you wish to just wait forever. -For send operations we need to add just one boolean return value indicating -if the process pid exists and the message has been delivered: +For send operations we need to add just one boolean return +value indicating if the process pid exists and the message +has been delivered: ``` if !send($pid, $msg) { @@ -142,7 +206,6 @@ may be unnecessary complexity. You can avoid this by always sending N messages and waiting for N responses before sending more messages. - ### TODO Spawned functions should have access to imported modules ? @@ -151,12 +214,9 @@ Spawned functions should have access to imported modules ? If send is never blocking, what if process queue gets too big ? just go on until memory exhausts ? -Not sure if passing parameters in spawn will not make things -inconsistent with function calls - -What happens when something is written on the stdout of a spawned -process ? redirect to parent shell ? - +Should send be synchronous how we are going to differentiate +between a timeout or a invalid pid error ? On the other hand +synchronous send solves the queueing problem. ## Extend rfork From ddb99b06897cadc946ac14c24ed0e343084768a7 Mon Sep 17 00:00:00 2001 From: Tiago Katcipis Date: Wed, 24 Jan 2018 19:00:34 -0200 Subject: [PATCH 7/7] Add info about how stdout is handled --- proposal/2-concurrency.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposal/2-concurrency.md b/proposal/2-concurrency.md index 9230a5e1..c84093cf 100644 --- a/proposal/2-concurrency.md +++ b/proposal/2-concurrency.md @@ -94,6 +94,10 @@ for job in $jobs { } ``` +All output (stdout and stderr) of processes go to their +parent until the root (main) process, so printing inside +a child process will print on the stdout of the main process. + ### Advanced Fan-out Fan-in Here is an example of a more elaborated fan-out/fan-in.