-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Concurrency design #247
Conversation
Codecov Report
@@ Coverage Diff @@
## master #247 +/- ##
=========================================
Coverage ? 56.12%
=========================================
Files ? 26
Lines ? 4269
Branches ? 0
=========================================
Hits ? 2396
Misses ? 1646
Partials ? 227 Continue to review full report at Codecov.
|
<-$chan | ||
``` | ||
|
||
Fan out and fan in should be pretty trivial: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the blocking/nonblocking types of channel?
If only blocking channels are supported, then some builtin function to 'select' between then will be required. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Select would be great =D. Nothing against buffered channels either...just not thinking about it right now. Actually I'm supposing you are talking about the buffered channels, nonblocking channels do not exist on Go, when the channel is full it will also block, the only way to guarantee that you will never block is by using select + default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the blocking/nonblocking types of channel?
What I mean was blocking/non-blocking types of communication, sorry, but buffered channels and/or select are ways to achieve this in Go, but I don't know how much of Go semantics makes sense to export to nash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ow I see. Erlang has only message delivery, like network datagrams, they are non blocking. The problem with non-blocking is that they infer some sort of queue on the receiver (on Erlang it is the process mailbox), because I can just send 10000 trillion messages without blocking.
Go makes this queue explicit and will still block the sender...not sure what model is better right now.
❌ Build nash 1.0.0.87 failed (commit NeowayLabs@7504b19db1 by @katcipis) |
send($senderpid, "pong") | ||
}() | ||
|
||
send($pid, "ping", self()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is always going to be ..., self())
. Do you agree? Maybe we can remove this third argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't. This concurrency model is highly decoupled, I'm not sure if it is a good idea to infer that a child process will always talk with the parent. The parent could actually send the pid of another process that will want the answer. Like starting a job source and N workers, maybe the answers will go to the job source so he knows if more jobs need to be sent.
But I feel your feelings that may make the parent->child relationship a little more verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could go to a more specific model that always impose a parent/child relationship, it would make some things easier...but I was liking the idea of the function spawned having no pre-defined signature for arguments (full flexibility), but I may be biased. @tiago4orion what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name this builtin sendto
and add send
to stdlib like:
fn send(pid, data) {
return sendto($pid, $data, self())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. But I'm rethinking being able to send multiple args and receiving them. On the send part it is ok since there is no timeout for sending, since it will not block. But receiving is a little more complicated. It seems useful to have a timeout when you run a receive, and you will need to be notified if the receive worked or if the timeout has expired, so receiving a dynamic number of args on the return makes it a little awkward to receive the error too (the return arity will always be the send arity + 1).
Perhaps if receive was a syntatic construct instead of a function and we had something similar to pattern matching, but only for arity, like (WARNING, heavily improvised syntax =P):
receive timeout {
onearg <= {
# code that used onearg here
}
two1, two2 <= {
# code that uses two args here
}
timeout <= {
# code that handles timeout here
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiago4orion @vitorarins What you think about the syntatic receive ?
If we dont go for some syntatic support I think we need to always send/receive only one value (would not be that bad if we got maps soon).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea but the example is kind of confusing to me.
I would be more comfortable with something like:
receive timeout {
fn(onearg) {
# code that used onearg here
}
fn(two1, two2) {
# code that uses two args here
}
fn() {
# code that handles timeout here
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now neither make me very happy...but if the idea is good we can search for something that looks nice =D
❌ Build nash 1.0.0.88 failed (commit NeowayLabs@c2c55499a9 by @katcipis) |
❌ Build nash 1.0.0.89 failed (commit NeowayLabs@c300d5f4f5 by @katcipis) |
❌ Build nash 1.0.0.90 failed (commit NeowayLabs@0d5812d379 by @katcipis) |
❌ Build nash 1.0.0.92 failed (commit NeowayLabs@de0afe1ad0 by @katcipis) |
In this context the process word is used to mean a concurrent thread of | ||
execution that does not share any data. The only means of communication | ||
are through message passing. Since these processes are lightweight | ||
creating a lot of them will be cheap (at least must cheaper than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that really depends on the implementation (I think).
For example, the code below:
import io
import fmt
import hugelibrary
import anotherhugelibrary
fn worker() {
# uses io, fmt and hugelibrary
}
spawn worker()
In the code above, the entire environment of the parent interpreter should be copied to every lightweight process? Then maybe it will not be lightweight anymore and we can have a big performance penalty at each process spawn. Note that worker function doesn't use anotherhugelibrary
but it gets copied anyway.
But if the parent environment is not copied, having each process to import their own libraries could be painful:
fn worker1() {
import io
import fmt
import hugelibrary
}
fn worker2() {
import io
import fmt
import anotherhugelibrary
# some code
}
spawn worker1()
spawn worker2()
This way the processes are really lightweight, could bootstrap really fast, but the processes needs to be self-contained and import everything they need every time.
I don't know what's better. Maybe a mixed approach? Copy the stdlib to every process but leave other libraries to explicit import? I dont know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either, I thought about that after I wrote this, my problem is not even with huge dependencies on the sense that this will make things not be lightweight, my problem is lack of isolation and sharing of state. If the same imported module is shared them the module state is also shared and it is a violation of the idea. I'm in doubt for two behaviors:
1 - Automatically reload the modules of the parent, but they will be freshly loaded modules (all module initialization is executed again). Perhaps that is already what you have in mind.
2 - Don't load anything, import again
Perhaps is a lot of cases the code executed concurrenly will not use the dependencies. It will depend a lot on the use case. az cli use cases will be a mess with any model since the login action will affect the user's home directory...but this is pretty much bad coding from microsoft...as usual. On this case the only one who garantees isolation is the rfork approach (we are going to have both anyway since they have pretty different use cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan o the hybrid =P
the behavior will be to timeout the request and try again (possibly | ||
to another service instance through a load balancer). | ||
|
||
To implement this idea we can add a timeout to the receive an add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 'and add'
|
||
Instead of using channel instances in this model you send messages | ||
to processes (actor model), it works pretty much like a networking | ||
model using UDP datagrams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something must be written about queuing of data. Should Every process have a queue of incoming data? What happens if data is being sent to some process but it never reads (never invokes receives)? Is it buffered? discarded? and so on. I think it should be buffered and we must document a maximum buffer size.
What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, now I saw the TODO section :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that writing code that depends on queues being infinite is a good idea, but when we where talking about implementing more higher level languages the idea of a natural number that gets as big as it can get did not sound like a problem =P, but to be fair the queue exausting memory will be easier indeed.
For me, or it is always infinite (as erlang) and you should never just send trillions of messages without waiting for some answer. Or on spawn we can pass the queue size. An internal fixed magic chosen values does not seem like a good choice to me x_x.
error will always involve a pid that has no owner (the process never | ||
existed or already exited). | ||
|
||
We could add a more specific error message if we decide that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
proposal/2-concurrency.md
Outdated
|
||
### TODO | ||
|
||
Spawned functions should have access to imported modules ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to think carefully about this, too many considerations. What about erlang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to find anything:
http://erlang.org/doc/reference_manual/processes.html
There is a lot of interesting concepts like linking and success exit VS errored exit...but they useful to build more robust/complex distributed systems where supervisors restarts failed processes. For now I think we can live without it =D. Perhaps just providing a way to check if a process terminated would be interesting.
(seems like no, but some usages of this may seem odd) | ||
|
||
If send is never blocking, what if process queue gets too big ? | ||
just go on until memory exhausts ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I like the idea of limits in the runtime, maybe user could change with builtin functions or env vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In networking you usually get to know that some other process is unable to answer you because the answer never gets received. This can happen for a lot of reasons (even packets being dropped because the queue is too big). So it would be a severe error to just send a lot of messages never waiting for some kind of response (it seems odd to me, but perhaps there is some use case). I think it is because of that that Erlang has no limit on the process mailbox. But I'm not against having a queue size either, I'm just not that agains infinite mailboxes as I used to be =P, for them to generate problems you must be already doing something wrong (exausting memory using messages required really big messages and a lot of messages).
The only thing that I'm certaing is that writing code that depends on infinite queue sizes is really a bad idea, perhaps we will help people to avoid idiotic problems.
In this case send will not return a boolean "ok" anymore since it is important to differentiate between a process that has its queue full and a process that is dead. Perhaps this kind of complexity that Erlang avoided with unlimited mailboxes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about datagram networking with UDP the only error handling that exists is the ICMP package indicating that there is no program listening on that port...all other errors will be detected by not receiving an answer forever (if you care for one).
Perhaps we can still stick with the boolean and use a queue size as a parameter...like OS'es do ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we can have infinite queues also but then we'll need some API for queue monitoring. If the memory used is too high, I want a way to easily know how much data is pending to be processed for debugging slow processes.
proposal/2-concurrency.md
Outdated
inconsistent with function calls | ||
|
||
What happens when something is written on the stdout of a spawned | ||
process ? redirect to parent shell ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, by default I think this is the right behaviour: redirect to parent's stdout.
❌ Build nash 1.0.0.120 failed (commit NeowayLabs@48cdf6bf7b by @katcipis) |
No description provided.