Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a method to allow sending a batch of commands #447

Open
jgraham opened this issue Jun 8, 2023 · 17 comments
Open

Consider a method to allow sending a batch of commands #447

jgraham opened this issue Jun 8, 2023 · 17 comments

Comments

@jgraham
Copy link
Member

jgraham commented Jun 8, 2023

Supporting cases where the client has significant latency to the browser is an expressed goal of WebDriver BiDi.

Generally supporting this requires minimising network roundtrips, as each roundtrip incurrs an overhead of twice the latency. This can be particularly important for timing-critical operations where inserting a roundtrip may affect the outcome of the test (e.g. because something changed in the page under test during the network delay).

Generally the solution here is to allow generating a set of commands upfront and sending them together, then getting a single result for the group of commands, rather than one result for each command.

There are many different possible ways to do this, with different tradeoffs. For example JSON-RPC has a simple batching mechanism that allows sending multiple commands and then getting the results of all the commands in a single response. However we intentionally didn't adopt JSON-RPC specifically because this feature didn't seem to hold its weight in terms of the functionality it enables compared to the cost.

This issue is to explore use cases and requirements for command batching, and explore possible designs.

Original comment A common part of the workflow for starting a WebDriver-BiDi session is performing initial configuration. This can involve various things e.g.: * Configuring global browser settings (e.g. proxy configuration, prefs, window size) * Subscribing to certain events * Installing preload scripts * Adding network request intercepts * Configuring BiDi settings (there isn't much of this so far, but in the future maybe e.g. setting log level filters)

Currently some of this is done in capabilities, and some of it is done in individual commands that all need to be run once the session is configured.

Ideally we wouldn't use capabilities for anything that happens after the browser is launched. They have this complex matching behaviour that doesn't really make sense for most configuration use cases. In particular we could aim to end up in a place where by the time you're connecting to an actual browser the capabilities have all been processed and can be ignored entirely (obviously a classic-style Driver is performing the role of a browser launcher, and so would consider configuration required at browser launch time).

(We also have "capabilities" which only exist in the return value; that's a rather different concern, although I note that we've failed to update the return value with new "optional" features).

Although removing existing capabilities might not be possible, I at least don't think we should add new post-launch configuration to capabilities.

So assuming we don't add all this configuration to capabilities directly it's currently up to the client to send all the configuration commands just after the session has started. That's OK, but it does add a bunch of roundtrip latency.

Early on there was some discussion about sending batches of commands. This seems like a clear case where that might be useful e.g. we give the session.new payload a field like commands which takes a list of commands. The session.new response is delayed until after all those commands have been dispatched and the response gets an extra field containing the return value of each command (which might be an error).

Alternatively we could have a more generic mechanism that doesn't special-case new session e.g. a session.sendCommands command whose payload would be a list of commands to run in-order and whose return value would be a list of responses to those commands.

@OrKoN
Copy link
Contributor

OrKoN commented Sep 7, 2023

Wanted to add a comment that to minimize the roundtrip latency somewhat the client can schedule all requests at once and then wait for all responses. It works if the next command does not depend on the output of the previous command.

@css-meeting-bot
Copy link
Member

The Browser Testing and Tools Working Group just discussed Provide support for batching commands, either only for new session, or for the general case.

The full IRC log of that discussion <AutomatedTester> Topic: Provide support for batching commands, either only for new session, or for the general case
<AutomatedTester> github: https://github.com//issues/447
<orkon> q+
<AutomatedTester> ack next
<AutomatedTester> qq+ orkon
<AutomatedTester> jgraham: one of the goals for bidi is that it should work well with relatively high latency connections
<shs96c> q+
<AutomatedTester> ... one of the problems is that as we make more commands to the rprotcol the harder it becomes for high latency connections
<AutomatedTester> ... so it makes sense to do batching
<AutomatedTester> ... like we have in actions. There was a basic proposal for locateNode to batch up searches for elements
<AutomatedTester> ... we could batch up things in the new session like "create session, insert these preload scripts and set something else up"
<AutomatedTester> ... the use cases are very interesting but only work in certain contexts like new session, find element...
<AutomatedTester> ... so do we want to build something that is generic and do we want to handle the case where is data flow between commands?
<AutomatedTester> ack next
<Zakim> orkon, you wanted to react to orkon
<AutomatedTester> orkon: I wanted to be mention that high latency for batching out commands in parallel
<AutomatedTester> ... If then there are wanting things to be sequentially then it can have a aggregate so that we can do e.g. scroll into view then do an interaction
<AutomatedTester> ... I wonder how many cases that would have dependecies between commands
<AutomatedTester> q?
<AutomatedTester> ack next
<AutomatedTester> shs96c: not just latency but bandwidth is something we need to care about
<AutomatedTester> ... a chatty protocol will slow things down too
<AutomatedTester> ... a lot of this is not relevant for localhost
<AutomatedTester> ... this is a problem when we introduce the internet and we can see about zipping up data
<AutomatedTester> ... and batching be another
<AutomatedTester> ... a number of people have looked to do this in the past
<jgraham> q+
<AutomatedTester> ... I dont think we need to worry about in the spec. I think we can see about putting a note in the spec saying this could be made more efficient so you can do it
<sadym> q+
<AutomatedTester> ... I dont think we need to solve it but be cognizent. we dont want to be too chatty
<AutomatedTester> ack next
<AutomatedTester> jgraham: stuff like compression can be done transparently and i am not worried about it
<AutomatedTester> ... for batching I am worried since we are already doing kinda batching I dont think we can't just do it
<shs96c> q+
<orkon> q+
<AutomatedTester> ... <explains use case with actions and executeScript which can't be done in 1 action>
<AutomatedTester> ... and then there is the locateNode that goes from element to shadow to another node to ...
<AutomatedTester> ... we are building batching in different ways and that might be the right way to do it but we could also see if this should be generic
<AutomatedTester> q?
<AutomatedTester> ack next
<AutomatedTester> sadym (IRC): are talking about batching or setting dependencies on commands?
<AutomatedTester> jgraham: yes
<AutomatedTester> jgraham: if we are talking about commands that have sequence then we need to figure out a way to share data
<AutomatedTester> shs96c: in locateNode there could be ways to do this to build out an AST how you want it to figure out the data across
<AutomatedTester> sadym (IRC): if we want dependencies that we can do that. We might need to expose to it the realm and you do that
<AutomatedTester> jgraham: we could have done this all in JavaScript but that is executing things
<AutomatedTester> ... so people could execute the bidi commands from a javascript sent acorss the wire
<AutomatedTester> q?
<AutomatedTester> ack next
<AutomatedTester> shs96c: I think sadym (IRC) covered my point
<AutomatedTester> q?
<AutomatedTester> ack next
<AutomatedTester> orkon: I think if we batching of sequetial execution that it would be beneficial
<AutomatedTester> ... I wonder if there are other specs that show how to describe workflows
<AutomatedTester> ... but I don;t think we want to have that
<AutomatedTester> ... it will then be very hard for intermediaries to be able to understand and execute across platforms
<AutomatedTester> q?
<AutomatedTester> jgraham: I think there is an example <something proto> that allows you to describe a way to batch workflows
<AutomatedTester> ... with intermediaries I think its better to not have a scruipting language that does this
<AutomatedTester> ... they wouldnt be able to catpure and manipulate the payload
<shs96c> CapnProto time travel RPC: https://capnproto.org/rpc.html
<AutomatedTester> s/<something proto>/CapnProto time travel RPC
<AutomatedTester> jgraham: if people are excited by this it might be good for someone to create a prototype and then we can go from there. We dont fully understand all the tradeoffs
<orkon> q+
<AutomatedTester> ack next
<AutomatedTester> orkon: I think we have a proxy format that is like JS
<AutomatedTester> ... it has hard to parse and compile
<AutomatedTester> ... and it is probably too many steps to handle
<AutomatedTester> q?
<jgraham> PAC (Proxy AutoConfiguration)
<AutomatedTester> s/proxy format/PAC (Proxy AutoConfiguration) proxy format

@jimevans
Copy link
Collaborator

After the discussion at TPAC, having a command batch facility, with the ability to share response data between commands in the batch would unlock a huge number of scenarios that would accomplish a lot in the way of respecting high-latency connections and reducing round-trips between the local and remote ends. I would most assuredly like to see such a thing created.

@jgraham
Copy link
Member Author

jgraham commented Sep 19, 2023

OK, I'm going to update this issue to be about the more generic option.

I think it would be useful to document known use cases.

Use Cases

Initial Setup

At session start time, a client wants to configure multiple aspects of the session e.g. subscribe to events, install preload scripts, setup network intercepts.

Requirements

Commands run in parallel without dependencies. Result is combined result of all commands.

Alternatives

Can send commands one at a time, and get N separate responses. In the worse case this adds N roundtrips so we get 2N * latency extra delay. This is reduced if we're processing command 1 whilst sending command 2, but many setup commands are relatively fast running (e.g. just adding some configuration data), so we may be dominated by network overhead.

Locators

When locating elements we want to be able to select a set of elements, and then use that set of elements as the context for selecting further elements e.g. to select all heading elements which contain the substring "example". We may also want to extract specific properties of the elements directly e.g. if we want to get the innerText of all such elements, rather than the elements themselves.

In addition some clients support ways to combine selectors with combinators like and or or.

Requirements

It must be possible to run commands in sequence, using part of the output from previous commands into subsequent commands. The final result must be the result of the final command, or the first error when it occurs.

Combinators may look like operations on the results e.g. running multiple locators in parallel and then either combining the results, or taking one or the other results (although note that if or is supposed to short circult that may require more explicit conditional logic).

Alternatives

Many compound selectors can be expressed directly in either CSS or XPath. However these can get complex (especially XPath) and don't allow the extensibility offered by the ability to interleave script (e.g. XPath supports substring matching, but not the use of innerText).

We could special-case locators and express these requirements specifically as kinds of locator strategy. However that would not generalise to other commands.

Implementations could compile locators down to js e.g. the above example could be compiled to

(root) => {
  let headings = root.querySelector("h1,h2,h3,h4,h5,h6");
  let filtered = Array.from(headings).filter(|x| x.innerText.includes("heading"));
  return filtered.map(|x| x.innerText)
}

However that requires all clients to implement a compiler from locator chains to JS. It also doesn't work well for traversing into shadow trees, since closed shadow roots aren't available even to JS running in a sandbox.

Input

We currently model input as a custom batching, where we initially set up a set of input devices, and then run an action (maybe a noop) for each device in parallel, wait for all of them to complete (and the event loop to spin enough for events generated to be dispatched) and then move on to the next action for each device. However we sometimes get requests to inline other behaviour into the actions e.g. running some script to scrollIntoView a particular element, or similar. That would currently require duplicating parts of the script module as a specific action.

Requirements

Run initial setup (setting up devices, etc.). Run actions for each "tick" in parallel and wait for all to complete. Then wait for the event loop to reach a marker. Then run another set of events in parallel. No return value required.

Alternatives

Add script as an input source type and allow running scripts using that input source type.

Do nothing, and require clients to roundtrip when they want to interleave actions with other commands.

Prior Art

captnproto promise pipelines

@jgraham jgraham changed the title Consider a method to allow sending initial commands as part of session creation Consider a method to allow sending a batch of commands Sep 19, 2023
@jimevans
Copy link
Collaborator

jimevans commented Sep 19, 2023

@jgraham Thanks for writing this up. It's articulate in a way I could not be.

It sounds like to support all of the cases enumerated above, we would want a data structure that enables a chain of commands. Think a list or array, like:

[
  {
    "method": "first.command",
    "params": {
      "command1", "parameters"
    }
  },
  {
    "method": "second.command",
    "params": {
      "command2", "parameters"
    }
  }
]

with some mechanism to pass the results of commands around. For the sake of discussion, let's call this a CommandStream.

At the same time, to support some of the use cases (Inputs and Initial Setup, we need a data structure that allows parallel CommandStreams, with some form of sync points between parallel streams, and this latter is especially important for the Input use case, and is currently filled by the pause action type in the current inputs serialization. And as a preemptive argument rebuttal, yes, it's perfectly possible to do parallel streams from the client side, by firing off multiple streams and not immediately awaiting for the return of each one, handling those returns as they complete. However, that makes the general solution unsuitable for the Input case, which requires synchronization between parallel streams.

So we'd need an "array of arrays", like:

[
  [
    {
      "method": "first.command",
      "params": {
        "command1", "parameters"
      }
    },
    {
      "method": "second.command",
      "params": {
        "command2", "parameters"
      }
    }
  ],
  [
    {
      "method": "third.command",
      "params": {
        "command3", "parameters"
      }
    },
    {
      "method": "fourth.command",
      "params": {
        "command4", "parameters"
      }
    }
  ]
]

Looking back at the above, I realize this is deeply similar to what we've already got in the Input module, and that seems like a reasonable model to embrace as it's been shown to be useful and resilient, but it needs extensions.

I'm not sure what the sync point concept looks like for cross-CommandStream synchronization. Also, it brings up the question whether one would want to share data between parallel CommandStreams. I'm going to assume that latter adds additional complexity for marginal benefit, and we would reject that as a goal for this solution.

I also don't know if it makes sense to implement a retry mechanism, where a CommandStream can be iteratively run until some condition (handy for pushing polling-type waits to the remote end), or if we want to guarantee increased wire traffic by forcing clients to implement this on the local end (which is the WebDriver Classic solution). if we chose to implement polling, we would without a doubt need to have some timeout mechanism for long-running or never-ending commands. Again, I suspect the appetite for adding the complexity of a polling or retry facility to be low, because (1) commands may not be idempotent, and (2) we would need some way to evaluate conditions, and can't necessarily rely on script for that, but I thought I'd bring it up as a potential enhancement.

As an aside, clients should already be be implementing some sort of timeout for when a command doesn't complete before some interval, but we have no general mechanism in the protocol for what happens when a client abandons waiting for an incomplete long-running command, so I'm less concerned about that.

I hasten to point out that there's another design here where we fire a CommandStream or a parallel set of them, immediately return some sort of execution ID, and have an event that is raised when the stream (or stream set) completes. That would be more closely aligned with a Promise-like behavior, but again, that seems possibly overly complex to me.

So next steps in designing the data structure for the payload would be: (1) what do the sync points between parallel streams look like, and (2) what does the structure look like for passing results to subsequent commands in a serial stream, and consuming properties of the results in those subsequent commands. I don't have good ideas on what that looks like yet. Please help.

@jimevans
Copy link
Collaborator

jimevans commented Sep 19, 2023

One more comment: It might make sense to leave the input batching alone as encapsulated in a single command. I could see one wanting to do something like:

[
  {
    "method": "browsingContext.locateNodes"
  },
  {
    "method": "input.performActions"
  }
]

where the input method performs actions based on the node located. If the Input case is deconstructed so that parallel CommandStreams are used instead of the current single set of parameters, that would require the desired node to be passed to each of the parallel streams, which is suboptimal. Again, yes, a client could do this with two calls, but that defeats the purpose. Of course, the problem could be solved by making one of the things a CommandStream contain is another CommandStream, which seems like an elegant, if complex solution.

@jgraham
Copy link
Member Author

jgraham commented Sep 19, 2023

In terms of solutions, I think you can get a long way there by treating each command as a promise and providing analogues of the usual promise combinators. So running multiple commands in parallel would be like Promise.all (if you need them all to complete) or Promise.race (if you don't). Running multiple commands in series would be like normal promise chaining. There's a bit of detail here on error handling (Promises have any in place of race and allSettled in place of all for cases where you want to handle errors).

Then then question is about passing data between commands. An initial cut might use $0 to refer to the result of the previous command.

So a use case like chaining locators (using a kind of pseudocode rather than writing out the payloads in full) would be:

command.chain(commands=[
  browsingContext.locateNodes(type="css", context=[someNodeReference], selector="h1,h2,h3,h4,h5,h6"),
  browsingContext.locateNodes(type="text", context=$0.nodes, selector="Heading")
  script.callFunction(functionDeclaration=elems => elems.map(elem => elem.innerText), args=[$0.nodes], sandbox="client-code")
]);
// Response is just the result of the final command

Running commands in parallel would be

command.all(commands=[
  session.subscribe(events=["log.entryAdded", "network.beforeRequestSent"]),
  script.addPreloadScript(functionDeclaration=() => {/*some long script*/}, sandbox="client-code"),
  network.addIntercept(phases=["beforeRequestSent"])
  ])
// Response is an array with the results of all commands

And decompsed actions would be a combination of both (using really made-up commands):

command.chain([
  // Set up the input devices
  command.all([
    actions.addInput(type="pointer, name="pointer-1"),
    actions.addInput(type="key", name="key-1")
  ]),
  // Now for some reason we want to run some script
  script.callFunction(functionDeclaration=elem => scrollIntoView(elem), args[elem])
  // Now do the first tick
  actions.pointerMove(input="pointer-1", origin="viewport", x=10, y=10, duration=100),
  // Probably here we need actions.waitForEvents or something, but for now move on to the second tick
  command.all([
    actions.pointerDown(input="pointer-1"),
    actions.keyDown(input="key-1", key=ctrl)
  ])
  // Tick 3
  locator.locateNode(type="css", selector="#target"),
  actions.pointerMove(input="pointer-1", origin=$0.nodes[0], x=20, y=20)
  // Tick 4
  command.all([
    actions.pointerUp(input="pointer-1"),
    actions.keyUp(input="key-1", key=ctrl)
  ])
])

That has a couple of limitations. In particular in the first instance it doesn't allow storing values other than the previous output. So one couldn't do browsingContext.locateNode at the start of an action chain and then refer to the returned node later as the origin; you'd have to do it right before hand. One can certainly imagine solving that using some kind of storage. Also for node locators in particular, one could model and with script, as follows:

command.chain([
  command.all([
    browsingContext.locateNodes(type="css", selector=".foo")
    browsingContext.locateNodes(type="text", selector="example)
  ]),
  script.callFunction((nodes1, nodes2) => Array.from(nodes1).filter(elem => Array.from(nodes2).contains(elem)), args=[$0.results[0].nodes, $0.results[1].nodes)
]);

But you can't short-circuit because there's no conditional evaluation i.e. you can't easily do "find all the nodes matching one selector, or if there aren't any all the nodes matching another selector" without always running both selectors.

@jimevans
Copy link
Collaborator

@jgraham As an aside for locators and proposing and and or, it's probably better to think of them as "intersection" and "union" in concept, rather than short-circuiting logical operators.

@jgraham
Copy link
Member Author

jgraham commented Sep 19, 2023

Right, maybe there's no use case for short-circuiting :) I did indeed think about it because of the term or before realising that it was really union :) But I wanted to be clear about the limitations of the proposed model.

@jgraham
Copy link
Member Author

jgraham commented Sep 19, 2023

Oh, one wrinkle I just thought of is what happens if you have an array of nodes and want to get a property that's in the serialized form, but not available through the DOM object. If $0.nodes as a locator is passed in as args to a script.callFunction presumably all the node references in the array are going to be deserialized automatically so you can't access the value.shadowRoot property which you need for selector chaining.

@jimevans
Copy link
Collaborator

Oh, one wrinkle I just thought of is what happens if you have an array of nodes and want to get a property that's in the serialized form, but not available through the DOM object. If $0.nodes as a locator is passed in as args to a script.callFunction presumably all the node references in the array are going to be deserialized automatically so you can't access the value.shadowRoot property which you need for selector chaining.

Which was the motivation for the ShadowLocator type in the original locateNodes strawman, to be able to access the shadowRoot property and return its RemoteNodeValue, but that was roundly criticized as "why do we need this."

@jgraham
Copy link
Member Author

jgraham commented Sep 20, 2023

Yes, the use case for that was clear :) Of course one can build a specific solution for transforming specific objects into specific other objects (e.g. list of Node→ list of shadowRoot), but part of the point of this issue is to think about what the general case looks like. I hadn't immediately understood that the general case is more like a map — or indeed a filter-map — operation, because I was thinking in the context of individual objects rather than collections of objects.

So the general requirement here is something like:

Given a command that returns an array of objects, we need to be able to extract properties from each item in the array, discarding any for which the property doesn't exist, or is null, and pass it on as the input to a subsequent command.

So there's more or less two options I can think of.

Option 1 is that one defines something at the data flow layer to implement filter_map like:

command.chain(commands=[
  browsingContext.locateNodes(type="css", elector=".shadowContainer"),
  browsingContext.locateNodes(type="css", context=filter_map($0.nodes, item => item.value.shadowRoot), selector=".targetInShadowTree")
]);

This has the semantics that if the result is null, or any accessed property doesn't exist the item is skipped.

Option 2 is that we have an explicit way to convert an argument to/from raw protocol values and use script for manipulation e.g. something like:

command.chain(commands=[
  browsingContext.locateNodes(type="css", elector=".shadowContainer"),
  script.callFunction(functionDeclaration=rawNodes => rawNodes.map(node => node.value?.shadowRoot).filter(node => node).map(node => deserialize(node)), args=[serialize($0.nodes)]),
  browsingContext.locateNodes(type="css", context=$0, selector=".targetInShadowTree")
]);

Where the serialize function (which in the example is assumed to be applied outside the function) prevents the protocol data being deserialized when the function is called, and the deserialize function (which is assumed to somehow be available inside the function) causes the return values to be converted back into actual nodes rather than treated as JS objects.

I don't love either of these options, but at least it's a starting point for discussions :)

@jimevans
Copy link
Collaborator

I'm not sure I fully understand why we'd need deserialize for option 2. Wouldn't it already be deserialized by the callFunction algorithm deserialization algorithm? I'm sure I'm missing something seriously simple, but that's just me having a hard time wrapping my head around it.

What would happen if we introduced a flavor of callFunction that was intended for use only within the context of a command chain? Maybe call it script.transformPayload or something (note: it probably doesn't belong in the script module, and I despise that name, so let's not actually call it that, but it's a placeholder). Anyway, the idea would be that it would not do the deserialization of whatever args are passed in, just use them as passed. Likewise, perhaps it doesn't serialize the values on return.

This is probably an idea that has zero merit, but I'm trying to get my head around how to make this all work in the general case.

@jgraham
Copy link
Member Author

jgraham commented Sep 20, 2023

deserialize is needed because let's say we pass in a node and want to get it's shadowRoot. But we apply the don't-deserialize-on-input hack. So in this case the input to the function is a js object like:

{
  "type": "node",
  "sharedId": "node id",
  "value": {
    shadowRoot: {
        "type": node,
        "sharedId: "shadow root id",
        // Other shadow root properties
    }
  }
  // Other node properties
}

Then we extract the shadow root and this gives us a JS object like:

{
  "type": node,
   "sharedId: "shadow root id",
   // Other shadow root properties
}

The problem is on return we serialize the object and so get something like:

{
  "type": "object",
  value: [
    ["type", {"type": "string", "value": "node"}],
    ["sharedId", {"type": "string", "value": "shadow root id"}]
  ]
}

(or something like that, I did it from memory so probably got the format slightly wrong).

I think you're right that a variant of callFunction that only worked in the context of batched commands, and simply bypassed the serialization and deserialization steps would be enough to avoid this problem. It's kind of low-level, but I think it's a more workable approach than I had.

@jimevans
Copy link
Collaborator

It occurs to me that if we implement this, then sending a single command for a single response becomes a special case of the general batching case where chain.length === 1. We could define the entire command execution in terms of batches. In fact, we'd probably have to.

@jimevans
Copy link
Collaborator

jimevans commented Oct 12, 2023

I've been doing some thinking about this, and I've come up with a design proposal that might work, at least for sequences or chains of commands. Having commands run in parallel is not something I've got a ton of interest in, unless there's some sort of synchronization mechanism for having multiple chains run and sync up their execution at some point (a la the pause command in action sequences). So let me write a little pseudo-CDDL that would tie into existing definitions in the spec.

A couple of caveats: I'm writing this out so it's documented somewhere besides just in my head. I'm sure this design will be deficient in some ineffable way, and unacceptable to at least some who'd be implementing it. Also, before I put this potential design out there, let me state once again, and in no uncertain terms, that I am not going to author the PR that attempts to implement this feature. I want the feature very, very badly, but I don't have the emotional energy to fight tooth and nail for every single detail. Here is what I had in mind:

Types

The session.CommandChainDataStorageValue Type

session.CommandChainDataStorageValue {
  storageIndex: js-uint
}

The session.ChainedCommandExecution Type

session.CommandChainDataTransformation = {
  method: text,
  parametersFunctionDeclaration: text,
  ? arguments: [* (script.LocalValue / session.CommandChainDataStorageValue)]
}

The session.ExecuteCommandChain Command

Command Type

session.executeCommandChain = {
  method: "session.executeCommandChain",
  params: session.ExecuteCommandChainParameters
}
session.ExecuteCommandChainParameters = {
  commands: [+ CommandData / session.CommandChainDataTransformation]
}

Return Type

ResultData

The basic algorithm for processing this would be as follows:

  1. Let chain results be an empty list.

  2. For each command in commandParameters["commands"], perform the following steps:

    1. If command matches the production of a known CommandData, execute command and append its result to chain results. Note that this command execution should not serialize the result to JSON. Otherwise, if command matches the production of session.CommandChainDataTransformation, perform the following steps:
      1. Let function declaration be command["functionDeclaration"].
      2. Let target command be command["method"].
      3. If command contains arguments, let arguments be command["arguments"]. Otherwise, let arguments be an empty list.
      4. Let parsed arguments be an empty list.
      5. For each argument in arguments, perform the following steps:
        1. If argument is a script.LocalValue, append argument to parsed arguments. Otherwise, perform the following steps:
          1. Assert argument is a session.CommandChainDataStorageValue.
          2. Let argument value index be argument["storageIndex"].
          3. If argument value index is greater than or equal to the length of chain results, return error with "invalid argument".
          4. Append the value at argument value index of chain results to parsed arguments.
          5. Let chain data transformation be the result of executing function declaration with arguments parsed arguments. Note that this execution should not serialize the result to JSON.
          6. If chain data transformation does not match the production of the parameters defined by target command, return error with "invalid argument".
          7. Let chain result be the result of executing a command with "method" target command and "params" chain data transformation
          8. Append chain result to chain results.
  3. Let last result be the value the entry in chain results with the index of chain results.length - 1.

  4. Return success with data last result.

There are some things to note here. First, there is admittedly a ton of error handling missing in the above. This is intended as a skeleton, not a fully fleshed-out spec proposal or PR. Second, I'm quite certain that the phrasing in the above algorithm isn't right for direct copy-paste into a spec PR. Third, yes, I'm quite aware of the limitations imposed by executing a function definition without specifying a realm and so on; I will leave that to the spec PR author to work out those details. Fourth, I realize I'm probably shouting at the rain for this feature, but I really do believe it would unlock a lot of scenarios by minimizing round trips from a local end to a remote end.

@kylepl
Copy link

kylepl commented Dec 1, 2023

Not sure what the proper way is to provide feedback, but just noting as a user that is trying to reduce latency of their tests, I would like to:

  • FindElement
  • Input text
  • Click
  • Check assertions (e.g. does FindElements for some xpath return 1 element)

All in one batch. I assume everything except assertion checking would be planned, but this could avoid many back-and-forths and express essentially my entire test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants