Add `SupportsParallel` property to `CmdletBinding` #206

KirkMunro · 2019-06-26T23:54:31Z

No description provided.

alx9r

comment superceded by this review

KirkMunro · 2019-06-27T18:25:11Z

@alx9r No problem, the comments you made here already made sense to me for the most part, but I'll appreciate your inline review when you can type it in again.

Just thinking out loud on the first point, it makes me wonder if there should a SessionStateDelta that could be created through the invocation of the command and it's begin block, be applied to the other thread/runspace during execution of the process block, and then be used during the creation of the end block as well. That's purely off the cuff thinking, and I haven't spent any time digging into this thought technically yet at all.

On point 2, I saw comments about that, but haven't looked into it. I'll spend some more time thinking about it. Presumably that would have to be solved for this or #194 because script blocks are being defined in one session state and then used in another in either solution.

For point 3, runspace pooling or a way to identify the required modules (perhaps via AST inspection of the process block) and pre-load runspaces in parallel with those modules ready to go are the first ideas that come to mind. This problem would also have to be solved for this or #194.

Thank you for clearly identifying these three technical challenges. I have no doubt that my brain will be processing them whenever it's idle. 😉

alx9r

I prefer the ergonomics that this RFC proposes over those in #194. I'd really like for this to become a reality. The implementation of this RFC seems to rely on the following though:

A way of unifying the SessionStates of the "Invoke-Command" call site and the invocation site (in a different runspace) of the user scriptblock. This is required in order to achieve the variable resolution behavior that is familiar from single-threaded PowerShell.
A definition and implementation of the behavior of a scriptblock bound to the SessionState of one runspace when that scriptblocks is invoked in a different SessionState. Currently doing so crashes powershell, remains an open issue in PowerShell/PowerShell#7626, and does not seem to have progressed toward a resolution.
Transparent performant management of module loading into multiple runspaces in parallel.

(see also my comment on #194 for more background on some of these challenges)

AFAICT these issues are the barrier to good ergonomics for any in-process PowerShell parallelization implementation. I'm fairly confident that the remainder of the implementation is straightforward mainly because I and several others have working solutions up to these issues. Accordingly, I've spent quite a bit of time experimenting with and investigating these very things. Based on that experience I estimate that the challenge of satisfactorily resolving these three issues is probably an order of magnitude larger than the remainder of all of the implementations described in this and the #194 RFCs.

I guess I'm suggesting that if this proposal is going to come to fruition, effort should probably be focused on overcoming these three issues.

alx9r · 2019-06-27T18:10:28Z

1-Draft/RFCNNNN-Supports-Parallel.md

+        # NEW! Unlike Start-Job commands where you must use the "using:" prefix
+        # on variables to access variables that are defined outside of the
+        # scope of the script block, all variables that are defined in the
+        # current scope would be accessible when the command is invoked with


Can you clarify what you mean by "the current scope" here? Do you mean the SessionState at the Invoke-Command call site? If so, how do you envision variables defined at the call site will be resolved from the perspective of process{}? Would the SessionState to which the process{} scriptblock is bound be a "child" of the SessionState from which Invoke-Command was invoked? (This would be similar to the way module SessionStates often have the "global" SessionState as a parent.)

(For readers unfamiliar with PowerShell's SessionState scoped variable search "rules", @lzybkr provides an explanation in #6139. The important thing to understand is that those "rules" result in variable resolution behavior that is substantially different from C#.)

alx9r · 2019-06-27T18:18:10Z

1-Draft/RFCNNNN-Supports-Parallel.md

+        # however, there are valid cases where script blocks must be
+        # transportable into jobs, and this feature is one such case. Any
+        # variables that contain script blocks (i.e. parameters, variables
+        # defined in a begin block, etc.) must be made available to the thread


Can you elaborate on how you envision this working?

Currently scriptblocks are bound to either the "global" or a module's SessionState. Each scriptblock can read and alter the SessionState to which it is bound when it is invoked. This works fine in a single-threaded scenario because only one scriptblock at a time can alter a given SessionState. With a second thread, however, two threads could be accessing the same SessionState and currently results in a crash. That is the topic of PowerShell/PowerShell#7626 which does not seem to have an obvious resolution.

alx9r · 2019-06-27T18:28:51Z

1-Draft/RFCNNNN-Supports-Parallel.md

+parallel execution in the `PSCmdlet` object.
+* update the common parameter assignment to add `-Parallel`, `-ThreadLimit`,
+and `-TimeoutSecs` parameters to parameter sets that support `SupportsParallel`.
+* add logic to the command processor that identifies the local variables and


add logic to the command processor that identifies the local variables...

Can you explain what you mean by "local variables". PowerShell doesn't really have the concept of "local variables". Rather, it employs (among other things) a multitude of nested scopes hosted in SessionStates that may or may not themselves be related. AFAICT each variable encountered during scriptblock invocation is resolved by following a search algorithm. In general, I don't think something like "identifying the local variables" is computable. For example, in

$g = Get-SomeScriptBlockGenerator 1..10 | Invoke-Command { . $g.GetNextScriptBlock() }

The SessionState (and therefore variables in scope) of the scriptblock returned by .GetNextScriptBlock() cannot be discovered prior to call the .GetNextScriptBlock().

There is another way though, I think. AFAICT the search "rules" for a runspace could be expanded to include the SessionState of the call site of Invoke-Command of the parent runspace.

alx9r · 2019-06-27T18:39:11Z

1-Draft/RFCNNNN-Supports-Parallel.md

+
+### Runspace pooling (and other possible future parallel improvements)
+
+It may be desired to allow users to create a pool of runspaces, and then cycle


It may be desired to allow users to create a pool of runspaces...

I think this is an understatement. From the testing I have performed any parallelization without sophisticated runspace initialization and reuse is basically unusable save for a few contrived scenarios. Without such runspace management, for most scenarios so much time is spent waiting for modules to load that that wait far surpasses the gains of parallelization. I think this repro from PowerShell/PowerShell#7524 is the most direct demonstration of the problem.

alx9r · 2019-06-27T21:09:00Z

@alx9r No problem, the comments you made here already made sense to me for the most part, but I'll appreciate your inline review when you can type it in again.

Oh good. Thank you for the generosity.

Just thinking out loud on the first point, it makes me wonder if there should a SessionStateDelta that could be created through the invocation of the command and it's begin block, be applied to the other thread/runspace during execution of the process block, and then be used during the creation of the end block as well. That's purely off the cuff thinking, and I haven't spent any time digging into this thought technically yet at all.

If I understand what you're driving at, the resulting behavior of what you described can already be achieved by invoking the begin{} and end{} scriptblocks from Invoke-Command using SessionState.InvokeCommand.InvokeScript(). It's what would happen between the invocation of begin{} and end{} that, I think, requires our focus.

Note: I think what follows might only make sense for variables. I haven't thought this through for invoking functions defined in the SessionState where Invoke-Command is called.

Some definitions:

"parent runspace" - the runspace where Invoke-Command is invoked
"parent SessionState" - the SessionState of the call site of Invoke-Command
"child runspace" - the runspace where the user scriptblock is invoked
"child SessionState" - the "global" SessionState of the child runspace

The challenge to overcome is this:

during invocation, a scriptblock invoked in the child runspace can encounter a variable that is defined only in the parent SessionState
currently variable resolution for such a variable fails because the parent SessionState is not searched

Suppose there were a hook on the runspace class that looked something like this:

public class Runspace {
    // handler() is invoked whenever the runspace fails to find a variable mentioned 
    // during invocation of a scriptblock
    void RegisterGetVariableHandler(Func<string,PSVariable> handler) {
       //...
    }
    //...
}

That would allow the implementation of Invoke-Command to resolve, on the child runspace's behalf, a variable mentioned in the child runspace but missing from the child SessionState. The simplest implementation would be something like:

public class InvokeCommandCommand : PSCmdlet {
   //...
   void BeginProcessing() {
       //...
       runspace.RegisterGetVariableHandler(name=>SessionState.PSVariable.Get(name));
      //...
   }
  // ...
}

Here's an annotated example:

$p= 'parent'

1..4 | Invoke-Command -Parallel {
    "$p$_"
    <#
    1. The child runspace attempts to resolve the variable named 'p' using the usual 
    resolution rules and fails.  (this is the plain old behavior we familiar with)
    2. (new behavior starts here) Then the child runspace invokes the handler() that was registered with
    RegisterGetVariableHandler().
    3. handler() calls SessionState.PSVariable.Get("p") on the instance of .SessionState 
   that is the member of the instance of Invoke-Command.  That is the SessionState of
   Invoke-Command's call site.  The call to .Get("p") succeeds and returns a reference to
   the PSVariable named "p".
    4. The child runspace uses the PSVariable returned from handler() to interpolate the string.
    #>
}

# parent1
# parent2
# parent3
# parent4

Perhaps the addition of RegisterGetVariableHandler() capability should be proposed as another RFC.

alx9r · 2019-06-27T21:17:57Z

On point 2, I saw comments about that, but haven't looked into it. I'll spend some more time thinking about it. Presumably that would have to be solved for this or #194 because script blocks are being defined in one session state and then used in another in either solution.

I think it's fair to make an even broader statement as follows:

PowerShell/PowerShell#7625 would have to be solved for any implementation of in-process multi-threaded PowerShell.

AFAICT tell that bug is latent anywhere related PowerShell is invoked on multiple threads within a process.

alx9r · 2019-06-27T21:41:45Z

For point 3, runspace pooling or a way to identify the required modules (perhaps via AST inspection of the process block) and pre-load runspaces in parallel with those modules ready to go are the first ideas that come to mind.

I think declaring or discovering which modules to load is, relatively speaking, a minor issue. The major issue is the problem of opening runspaces in parallel: They can't. Or more specifically, currently the performance of importing modules into runspaces in parallel is, in general, worse than serializing the import of those modules.

Here's an annotated example:

Import-Module m # suppose this call takes 200ms

1..200 | Invoke-Command -Parallel -ThreadLimit 200 {
    Import-Module m # all of these 200 calls will take approximately 200 x 200ms = 40s to complete
    Invoke-SomeBlockingNetworkCommand -ComputerName "computer$_"
}

In other words, it would take 40 seconds just for the first runspace to complete importing its module. PowerShell spends that 40 seconds with one processor pinned and does nothing useful for the user during that time.

I think the above is a reasonable use-case but with unreasonable performance. The root cause AFAICT is contention somewhere in the implementation of PowerShell's module import machinery.

This problem would also have to be solved for this or #194.

I agree. I'd say

PowerShell/PowerShell#7035 and/or #7524(comment) would have to be solved for any general-purpose implementation of in-process multi-threaded PowerShell.

FWIW, of the three major issues I mentioned, this is the one I have made the most progress overcoming. I have a working implementation of Invoke-Parallel that serially imports modules into runspaces. You might be able to get a feel for those ergonomics from this screen capture (click to see the gif):

It would still take 40s for the 200th thread in the above example to call Invoke-SomeBlockingNetworkCommand, but at least the first thread's call would occur after only 200ms.

JamesWTruher · 2022-06-27T22:36:45Z

@PowerShell/powershell-committee believes that the issues and potential problems in providing a cmdletbinding extension far out weigh the benefits. PowerShell multi-threading is quite tricky to do.

KirkMunro added 2 commits June 26, 2019 18:17

initial commit

c3f5af9

first draft

2ad3277

KirkMunro mentioned this pull request Jun 27, 2019

Submit draft of RFC for ForEach-Object -Parallel proposal #194

Merged

add note to example

65ef4e6

alx9r reviewed Jun 27, 2019

View reviewed changes

joeyaiello added the WG-DevEx-SDK hosting PowerShell as a runtime, PowerShell's APIs, PowerShell Standard, or development of modules label Jul 27, 2021

JamesWTruher added the Committee-Rejected The RFC was not accepted label Jun 23, 2022

JamesWTruher closed this Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `SupportsParallel` property to `CmdletBinding` #206

Add `SupportsParallel` property to `CmdletBinding` #206

Uh oh!

KirkMunro commented Jun 26, 2019

Uh oh!

alx9r left a comment •

edited

Loading

Uh oh!

KirkMunro commented Jun 27, 2019

Uh oh!

alx9r left a comment

Uh oh!

alx9r Jun 27, 2019 •

edited

Loading

Uh oh!

alx9r Jun 27, 2019

Uh oh!

alx9r Jun 27, 2019 •

edited

Loading

Uh oh!

alx9r Jun 27, 2019

Uh oh!

alx9r commented Jun 27, 2019

Uh oh!

alx9r commented Jun 27, 2019

Uh oh!

alx9r commented Jun 27, 2019 •

edited

Loading

Uh oh!

JamesWTruher commented Jun 27, 2022

Uh oh!

Uh oh!


		### Runspace pooling (and other possible future parallel improvements)

		It may be desired to allow users to create a pool of runspaces, and then cycle

Add SupportsParallel property to CmdletBinding #206

Add SupportsParallel property to CmdletBinding #206

Uh oh!

Conversation

KirkMunro commented Jun 26, 2019

Uh oh!

alx9r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KirkMunro commented Jun 27, 2019

Uh oh!

alx9r left a comment

Choose a reason for hiding this comment

Uh oh!

alx9r Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alx9r Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

alx9r Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alx9r Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

alx9r commented Jun 27, 2019

Uh oh!

alx9r commented Jun 27, 2019

Uh oh!

alx9r commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesWTruher commented Jun 27, 2022

Uh oh!

Uh oh!

Add `SupportsParallel` property to `CmdletBinding` #206

Add `SupportsParallel` property to `CmdletBinding` #206

alx9r left a comment •

edited

Loading

alx9r Jun 27, 2019 •

edited

Loading

alx9r Jun 27, 2019 •

edited

Loading

alx9r commented Jun 27, 2019 •

edited

Loading