Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Concurrency design #247

Merged
merged 8 commits into from
May 4, 2018
Merged

WIP: Concurrency design #247

merged 8 commits into from
May 4, 2018

Conversation

katcipis
Copy link
Member

No description provided.

@codecov-io
Copy link

codecov-io commented Nov 10, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@2d426d7). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #247   +/-   ##
=========================================
  Coverage          ?   56.12%           
=========================================
  Files             ?       26           
  Lines             ?     4269           
  Branches          ?        0           
=========================================
  Hits              ?     2396           
  Misses            ?     1646           
  Partials          ?      227

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d426d7...ddb99b0. Read the comment docs.

<-$chan
```

Fan out and fan in should be pretty trivial:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the blocking/nonblocking types of channel?

If only blocking channels are supported, then some builtin function to 'select' between then will be required. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Select would be great =D. Nothing against buffered channels either...just not thinking about it right now. Actually I'm supposing you are talking about the buffered channels, nonblocking channels do not exist on Go, when the channel is full it will also block, the only way to guarantee that you will never block is by using select + default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the blocking/nonblocking types of channel?

What I mean was blocking/non-blocking types of communication, sorry, but buffered channels and/or select are ways to achieve this in Go, but I don't know how much of Go semantics makes sense to export to nash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow I see. Erlang has only message delivery, like network datagrams, they are non blocking. The problem with non-blocking is that they infer some sort of queue on the receiver (on Erlang it is the process mailbox), because I can just send 10000 trillion messages without blocking.

Go makes this queue explicit and will still block the sender...not sure what model is better right now.

@i4ki
Copy link
Collaborator

i4ki commented Jan 18, 2018

send($senderpid, "pong")
}()

send($pid, "ping", self())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is always going to be ..., self()). Do you agree? Maybe we can remove this third argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't. This concurrency model is highly decoupled, I'm not sure if it is a good idea to infer that a child process will always talk with the parent. The parent could actually send the pid of another process that will want the answer. Like starting a job source and N workers, maybe the answers will go to the job source so he knows if more jobs need to be sent.

But I feel your feelings that may make the parent->child relationship a little more verbose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could go to a more specific model that always impose a parent/child relationship, it would make some things easier...but I was liking the idea of the function spawned having no pre-defined signature for arguments (full flexibility), but I may be biased. @tiago4orion what do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name this builtin sendto and add send to stdlib like:

fn send(pid, data) {
    return sendto($pid, $data, self())
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. But I'm rethinking being able to send multiple args and receiving them. On the send part it is ok since there is no timeout for sending, since it will not block. But receiving is a little more complicated. It seems useful to have a timeout when you run a receive, and you will need to be notified if the receive worked or if the timeout has expired, so receiving a dynamic number of args on the return makes it a little awkward to receive the error too (the return arity will always be the send arity + 1).

Perhaps if receive was a syntatic construct instead of a function and we had something similar to pattern matching, but only for arity, like (WARNING, heavily improvised syntax =P):

receive timeout {
    onearg <= {
        # code that used onearg here
    }
    two1, two2 <= {
        # code that uses two args here
    }
    timeout <= {
        # code that handles timeout here
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiago4orion @vitorarins What you think about the syntatic receive ?

If we dont go for some syntatic support I think we need to always send/receive only one value (would not be that bad if we got maps soon).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but the example is kind of confusing to me.
I would be more comfortable with something like:

receive timeout {
    fn(onearg) {
        # code that used onearg here
    }
    fn(two1, two2) {
        # code that uses two args here
    }
    fn() {
        # code that handles timeout here
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now neither make me very happy...but if the idea is good we can search for something that looks nice =D

@i4ki
Copy link
Collaborator

i4ki commented Jan 18, 2018

@i4ki
Copy link
Collaborator

i4ki commented Jan 18, 2018

@i4ki
Copy link
Collaborator

i4ki commented Jan 18, 2018

@i4ki
Copy link
Collaborator

i4ki commented Jan 19, 2018

In this context the process word is used to mean a concurrent thread of
execution that does not share any data. The only means of communication
are through message passing. Since these processes are lightweight
creating a lot of them will be cheap (at least must cheaper than
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that really depends on the implementation (I think).
For example, the code below:

import io
import fmt
import hugelibrary
import anotherhugelibrary

fn worker() {
    # uses io, fmt and hugelibrary
}

spawn worker()

In the code above, the entire environment of the parent interpreter should be copied to every lightweight process? Then maybe it will not be lightweight anymore and we can have a big performance penalty at each process spawn. Note that worker function doesn't use anotherhugelibrary but it gets copied anyway.

But if the parent environment is not copied, having each process to import their own libraries could be painful:

fn worker1() {
    import io
    import fmt
    import hugelibrary
}

fn worker2() {
    import io
    import fmt
    import anotherhugelibrary
    # some code
}

spawn worker1()
spawn worker2()

This way the processes are really lightweight, could bootstrap really fast, but the processes needs to be self-contained and import everything they need every time.

I don't know what's better. Maybe a mixed approach? Copy the stdlib to every process but leave other libraries to explicit import? I dont know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, I thought about that after I wrote this, my problem is not even with huge dependencies on the sense that this will make things not be lightweight, my problem is lack of isolation and sharing of state. If the same imported module is shared them the module state is also shared and it is a violation of the idea. I'm in doubt for two behaviors:

1 - Automatically reload the modules of the parent, but they will be freshly loaded modules (all module initialization is executed again). Perhaps that is already what you have in mind.

2 - Don't load anything, import again

Perhaps is a lot of cases the code executed concurrenly will not use the dependencies. It will depend a lot on the use case. az cli use cases will be a mess with any model since the login action will affect the user's home directory...but this is pretty much bad coding from microsoft...as usual. On this case the only one who garantees isolation is the rfork approach (we are going to have both anyway since they have pretty different use cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan o the hybrid =P

the behavior will be to timeout the request and try again (possibly
to another service instance through a load balancer).

To implement this idea we can add a timeout to the receive an add
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'and add'


Instead of using channel instances in this model you send messages
to processes (actor model), it works pretty much like a networking
model using UDP datagrams.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something must be written about queuing of data. Should Every process have a queue of incoming data? What happens if data is being sent to some process but it never reads (never invokes receives)? Is it buffered? discarded? and so on. I think it should be buffered and we must document a maximum buffer size.
What do you guys think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, now I saw the TODO section :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that writing code that depends on queues being infinite is a good idea, but when we where talking about implementing more higher level languages the idea of a natural number that gets as big as it can get did not sound like a problem =P, but to be fair the queue exausting memory will be easier indeed.

For me, or it is always infinite (as erlang) and you should never just send trillions of messages without waiting for some answer. Or on spawn we can pass the queue size. An internal fixed magic chosen values does not seem like a good choice to me x_x.

error will always involve a pid that has no owner (the process never
existed or already exited).

We could add a more specific error message if we decide that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


### TODO

Spawned functions should have access to imported modules ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to think carefully about this, too many considerations. What about erlang?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unable to find anything:

http://erlang.org/doc/reference_manual/processes.html

There is a lot of interesting concepts like linking and success exit VS errored exit...but they useful to build more robust/complex distributed systems where supervisors restarts failed processes. For now I think we can live without it =D. Perhaps just providing a way to check if a process terminated would be interesting.

(seems like no, but some usages of this may seem odd)

If send is never blocking, what if process queue gets too big ?
just go on until memory exhausts ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I like the idea of limits in the runtime, maybe user could change with builtin functions or env vars?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In networking you usually get to know that some other process is unable to answer you because the answer never gets received. This can happen for a lot of reasons (even packets being dropped because the queue is too big). So it would be a severe error to just send a lot of messages never waiting for some kind of response (it seems odd to me, but perhaps there is some use case). I think it is because of that that Erlang has no limit on the process mailbox. But I'm not against having a queue size either, I'm just not that agains infinite mailboxes as I used to be =P, for them to generate problems you must be already doing something wrong (exausting memory using messages required really big messages and a lot of messages).

The only thing that I'm certaing is that writing code that depends on infinite queue sizes is really a bad idea, perhaps we will help people to avoid idiotic problems.

In this case send will not return a boolean "ok" anymore since it is important to differentiate between a process that has its queue full and a process that is dead. Perhaps this kind of complexity that Erlang avoided with unlimited mailboxes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about datagram networking with UDP the only error handling that exists is the ICMP package indicating that there is no program listening on that port...all other errors will be detected by not receiving an answer forever (if you care for one).

Perhaps we can still stick with the boolean and use a queue size as a parameter...like OS'es do ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can have infinite queues also but then we'll need some API for queue monitoring. If the memory used is too high, I want a way to easily know how much data is pending to be processed for debugging slow processes.

inconsistent with function calls

What happens when something is written on the stdout of a spawned
process ? redirect to parent shell ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, by default I think this is the right behaviour: redirect to parent's stdout.

@i4ki
Copy link
Collaborator

i4ki commented Jan 24, 2018

@katcipis katcipis merged commit 4a37709 into master May 4, 2018
@katcipis katcipis deleted the addConcurrencyDesign branch May 4, 2018 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants