-
Notifications
You must be signed in to change notification settings - Fork 129
Add SupportsParallel
property to CmdletBinding
#206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment superceded by this review
@alx9r No problem, the comments you made here already made sense to me for the most part, but I'll appreciate your inline review when you can type it in again. Just thinking out loud on the first point, it makes me wonder if there should a On point 2, I saw comments about that, but haven't looked into it. I'll spend some more time thinking about it. Presumably that would have to be solved for this or #194 because script blocks are being defined in one session state and then used in another in either solution. For point 3, runspace pooling or a way to identify the required modules (perhaps via AST inspection of the Thank you for clearly identifying these three technical challenges. I have no doubt that my brain will be processing them whenever it's idle. 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the ergonomics that this RFC proposes over those in #194. I'd really like for this to become a reality. The implementation of this RFC seems to rely on the following though:
- A way of unifying the SessionStates of the "Invoke-Command" call site and the invocation site (in a different runspace) of the user scriptblock. This is required in order to achieve the variable resolution behavior that is familiar from single-threaded PowerShell.
- A definition and implementation of the behavior of a scriptblock bound to the SessionState of one runspace when that scriptblocks is invoked in a different SessionState. Currently doing so crashes powershell, remains an open issue in Concurrently invoking a ScriptBlock in two runspaces causes InvalidOperationException and pwsh.exe termination PowerShell#7626, and does not seem to have progressed toward a resolution.
- Transparent performant management of module loading into multiple runspaces in parallel.
(see also my comment on #194 for more background on some of these challenges)
AFAICT these issues are the barrier to good ergonomics for any in-process PowerShell parallelization implementation. I'm fairly confident that the remainder of the implementation is straightforward mainly because I and several others have working solutions up to these issues. Accordingly, I've spent quite a bit of time experimenting with and investigating these very things. Based on that experience I estimate that the challenge of satisfactorily resolving these three issues is probably an order of magnitude larger than the remainder of all of the implementations described in this and the #194 RFCs.
I guess I'm suggesting that if this proposal is going to come to fruition, effort should probably be focused on overcoming these three issues.
# NEW! Unlike Start-Job commands where you must use the "using:" prefix | ||
# on variables to access variables that are defined outside of the | ||
# scope of the script block, all variables that are defined in the | ||
# current scope would be accessible when the command is invoked with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean by "the current scope" here? Do you mean the SessionState at the Invoke-Command call site? If so, how do you envision variables defined at the call site will be resolved from the perspective of process{}
? Would the SessionState to which the process{}
scriptblock is bound be a "child" of the SessionState from which Invoke-Command was invoked? (This would be similar to the way module SessionStates often have the "global" SessionState as a parent.)
(For readers unfamiliar with PowerShell's SessionState scoped variable search "rules", @lzybkr provides an explanation in #6139. The important thing to understand is that those "rules" result in variable resolution behavior that is substantially different from C#.)
# however, there are valid cases where script blocks must be | ||
# transportable into jobs, and this feature is one such case. Any | ||
# variables that contain script blocks (i.e. parameters, variables | ||
# defined in a begin block, etc.) must be made available to the thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on how you envision this working?
Currently scriptblocks are bound to either the "global" or a module's SessionState. Each scriptblock can read and alter the SessionState to which it is bound when it is invoked. This works fine in a single-threaded scenario because only one scriptblock at a time can alter a given SessionState. With a second thread, however, two threads could be accessing the same SessionState and currently results in a crash. That is the topic of PowerShell/PowerShell#7626 which does not seem to have an obvious resolution.
parallel execution in the `PSCmdlet` object. | ||
* update the common parameter assignment to add `-Parallel`, `-ThreadLimit`, | ||
and `-TimeoutSecs` parameters to parameter sets that support `SupportsParallel`. | ||
* add logic to the command processor that identifies the local variables and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add logic to the command processor that identifies the local variables...
Can you explain what you mean by "local variables". PowerShell doesn't really have the concept of "local variables". Rather, it employs (among other things) a multitude of nested scopes hosted in SessionStates that may or may not themselves be related. AFAICT each variable encountered during scriptblock invocation is resolved by following a search algorithm. In general, I don't think something like "identifying the local variables" is computable. For example, in
$g = Get-SomeScriptBlockGenerator
1..10 | Invoke-Command { . $g.GetNextScriptBlock() }
The SessionState (and therefore variables in scope) of the scriptblock returned by .GetNextScriptBlock() cannot be discovered prior to call the .GetNextScriptBlock().
There is another way though, I think. AFAICT the search "rules" for a runspace could be expanded to include the SessionState of the call site of Invoke-Command of the parent runspace.
|
||
### Runspace pooling (and other possible future parallel improvements) | ||
|
||
It may be desired to allow users to create a pool of runspaces, and then cycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be desired to allow users to create a pool of runspaces...
I think this is an understatement. From the testing I have performed any parallelization without sophisticated runspace initialization and reuse is basically unusable save for a few contrived scenarios. Without such runspace management, for most scenarios so much time is spent waiting for modules to load that that wait far surpasses the gains of parallelization. I think this repro from PowerShell/PowerShell#7524 is the most direct demonstration of the problem.
Oh good. Thank you for the generosity.
If I understand what you're driving at, the resulting behavior of what you described can already be achieved by invoking the begin{} and end{} scriptblocks from Invoke-Command using SessionState.InvokeCommand.InvokeScript(). It's what would happen between the invocation of begin{} and end{} that, I think, requires our focus. Note: I think what follows might only make sense for variables. I haven't thought this through for invoking functions defined in the SessionState where Invoke-Command is called. Some definitions:
The challenge to overcome is this:
Suppose there were a hook on the runspace class that looked something like this: public class Runspace {
// handler() is invoked whenever the runspace fails to find a variable mentioned
// during invocation of a scriptblock
void RegisterGetVariableHandler(Func<string,PSVariable> handler) {
//...
}
//...
} That would allow the implementation of Invoke-Command to resolve, on the child runspace's behalf, a variable mentioned in the child runspace but missing from the child SessionState. The simplest implementation would be something like: public class InvokeCommandCommand : PSCmdlet {
//...
void BeginProcessing() {
//...
runspace.RegisterGetVariableHandler(name=>SessionState.PSVariable.Get(name));
//...
}
// ...
} Here's an annotated example: $p= 'parent'
1..4 | Invoke-Command -Parallel {
"$p$_"
<#
1. The child runspace attempts to resolve the variable named 'p' using the usual
resolution rules and fails. (this is the plain old behavior we familiar with)
2. (new behavior starts here) Then the child runspace invokes the handler() that was registered with
RegisterGetVariableHandler().
3. handler() calls SessionState.PSVariable.Get("p") on the instance of .SessionState
that is the member of the instance of Invoke-Command. That is the SessionState of
Invoke-Command's call site. The call to .Get("p") succeeds and returns a reference to
the PSVariable named "p".
4. The child runspace uses the PSVariable returned from handler() to interpolate the string.
#>
}
# parent1
# parent2
# parent3
# parent4 Perhaps the addition of RegisterGetVariableHandler() capability should be proposed as another RFC. |
I think it's fair to make an even broader statement as follows:
AFAICT tell that bug is latent anywhere related PowerShell is invoked on multiple threads within a process. |
I think declaring or discovering which modules to load is, relatively speaking, a minor issue. The major issue is the problem of opening runspaces in parallel: They can't. Or more specifically, currently the performance of importing modules into runspaces in parallel is, in general, worse than serializing the import of those modules. Here's an annotated example: Import-Module m # suppose this call takes 200ms
1..200 | Invoke-Command -Parallel -ThreadLimit 200 {
Import-Module m # all of these 200 calls will take approximately 200 x 200ms = 40s to complete
Invoke-SomeBlockingNetworkCommand -ComputerName "computer$_"
} In other words, it would take 40 seconds just for the first runspace to complete importing its module. PowerShell spends that 40 seconds with one processor pinned and does nothing useful for the user during that time. I think the above is a reasonable use-case but with unreasonable performance. The root cause AFAICT is contention somewhere in the implementation of PowerShell's module import machinery.
I agree. I'd say
FWIW, of the three major issues I mentioned, this is the one I have made the most progress overcoming. I have a working implementation of It would still take 40s for the 200th thread in the above example to call |
@PowerShell/powershell-committee believes that the issues and potential problems in providing a cmdletbinding extension far out weigh the benefits. PowerShell multi-threading is quite tricky to do. |
No description provided.