-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider a method to allow sending a batch of commands #447
Comments
Wanted to add a comment that to minimize the roundtrip latency somewhat the client can schedule all requests at once and then wait for all responses. It works if the next command does not depend on the output of the previous command. |
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> Topic: Provide support for batching commands, either only for new session, or for the general case<AutomatedTester> github: https://github.com//issues/447 <orkon> q+ <AutomatedTester> ack next <AutomatedTester> qq+ orkon <AutomatedTester> jgraham: one of the goals for bidi is that it should work well with relatively high latency connections <shs96c> q+ <AutomatedTester> ... one of the problems is that as we make more commands to the rprotcol the harder it becomes for high latency connections <AutomatedTester> ... so it makes sense to do batching <AutomatedTester> ... like we have in actions. There was a basic proposal for locateNode to batch up searches for elements <AutomatedTester> ... we could batch up things in the new session like "create session, insert these preload scripts and set something else up" <AutomatedTester> ... the use cases are very interesting but only work in certain contexts like new session, find element... <AutomatedTester> ... so do we want to build something that is generic and do we want to handle the case where is data flow between commands? <AutomatedTester> ack next <Zakim> orkon, you wanted to react to orkon <AutomatedTester> orkon: I wanted to be mention that high latency for batching out commands in parallel <AutomatedTester> ... If then there are wanting things to be sequentially then it can have a aggregate so that we can do e.g. scroll into view then do an interaction <AutomatedTester> ... I wonder how many cases that would have dependecies between commands <AutomatedTester> q? <AutomatedTester> ack next <AutomatedTester> shs96c: not just latency but bandwidth is something we need to care about <AutomatedTester> ... a chatty protocol will slow things down too <AutomatedTester> ... a lot of this is not relevant for localhost <AutomatedTester> ... this is a problem when we introduce the internet and we can see about zipping up data <AutomatedTester> ... and batching be another <AutomatedTester> ... a number of people have looked to do this in the past <jgraham> q+ <AutomatedTester> ... I dont think we need to worry about in the spec. I think we can see about putting a note in the spec saying this could be made more efficient so you can do it <sadym> q+ <AutomatedTester> ... I dont think we need to solve it but be cognizent. we dont want to be too chatty <AutomatedTester> ack next <AutomatedTester> jgraham: stuff like compression can be done transparently and i am not worried about it <AutomatedTester> ... for batching I am worried since we are already doing kinda batching I dont think we can't just do it <shs96c> q+ <orkon> q+ <AutomatedTester> ... <explains use case with actions and executeScript which can't be done in 1 action> <AutomatedTester> ... and then there is the locateNode that goes from element to shadow to another node to ... <AutomatedTester> ... we are building batching in different ways and that might be the right way to do it but we could also see if this should be generic <AutomatedTester> q? <AutomatedTester> ack next <AutomatedTester> sadym (IRC): are talking about batching or setting dependencies on commands? <AutomatedTester> jgraham: yes <AutomatedTester> jgraham: if we are talking about commands that have sequence then we need to figure out a way to share data <AutomatedTester> shs96c: in locateNode there could be ways to do this to build out an AST how you want it to figure out the data across <AutomatedTester> sadym (IRC): if we want dependencies that we can do that. We might need to expose to it the realm and you do that <AutomatedTester> jgraham: we could have done this all in JavaScript but that is executing things <AutomatedTester> ... so people could execute the bidi commands from a javascript sent acorss the wire <AutomatedTester> q? <AutomatedTester> ack next <AutomatedTester> shs96c: I think sadym (IRC) covered my point <AutomatedTester> q? <AutomatedTester> ack next <AutomatedTester> orkon: I think if we batching of sequetial execution that it would be beneficial <AutomatedTester> ... I wonder if there are other specs that show how to describe workflows <AutomatedTester> ... but I don;t think we want to have that <AutomatedTester> ... it will then be very hard for intermediaries to be able to understand and execute across platforms <AutomatedTester> q? <AutomatedTester> jgraham: I think there is an example <something proto> that allows you to describe a way to batch workflows <AutomatedTester> ... with intermediaries I think its better to not have a scruipting language that does this <AutomatedTester> ... they wouldnt be able to catpure and manipulate the payload <shs96c> CapnProto time travel RPC: https://capnproto.org/rpc.html <AutomatedTester> s/<something proto>/CapnProto time travel RPC <AutomatedTester> jgraham: if people are excited by this it might be good for someone to create a prototype and then we can go from there. We dont fully understand all the tradeoffs <orkon> q+ <AutomatedTester> ack next <AutomatedTester> orkon: I think we have a proxy format that is like JS <AutomatedTester> ... it has hard to parse and compile <AutomatedTester> ... and it is probably too many steps to handle <AutomatedTester> q? <jgraham> PAC (Proxy AutoConfiguration) <AutomatedTester> s/proxy format/PAC (Proxy AutoConfiguration) proxy format |
After the discussion at TPAC, having a command batch facility, with the ability to share response data between commands in the batch would unlock a huge number of scenarios that would accomplish a lot in the way of respecting high-latency connections and reducing round-trips between the local and remote ends. I would most assuredly like to see such a thing created. |
OK, I'm going to update this issue to be about the more generic option. I think it would be useful to document known use cases. Use CasesInitial SetupAt session start time, a client wants to configure multiple aspects of the session e.g. subscribe to events, install preload scripts, setup network intercepts. RequirementsCommands run in parallel without dependencies. Result is combined result of all commands. AlternativesCan send commands one at a time, and get N separate responses. In the worse case this adds N roundtrips so we get 2N * latency extra delay. This is reduced if we're processing command 1 whilst sending command 2, but many setup commands are relatively fast running (e.g. just adding some configuration data), so we may be dominated by network overhead. LocatorsWhen locating elements we want to be able to select a set of elements, and then use that set of elements as the context for selecting further elements e.g. to select all heading elements which contain the substring "example". We may also want to extract specific properties of the elements directly e.g. if we want to get the In addition some clients support ways to combine selectors with combinators like RequirementsIt must be possible to run commands in sequence, using part of the output from previous commands into subsequent commands. The final result must be the result of the final command, or the first error when it occurs. Combinators may look like operations on the results e.g. running multiple locators in parallel and then either combining the results, or taking one or the other results (although note that if AlternativesMany compound selectors can be expressed directly in either CSS or XPath. However these can get complex (especially XPath) and don't allow the extensibility offered by the ability to interleave script (e.g. XPath supports substring matching, but not the use of We could special-case locators and express these requirements specifically as kinds of locator strategy. However that would not generalise to other commands. Implementations could compile locators down to js e.g. the above example could be compiled to (root) => {
let headings = root.querySelector("h1,h2,h3,h4,h5,h6");
let filtered = Array.from(headings).filter(|x| x.innerText.includes("heading"));
return filtered.map(|x| x.innerText)
} However that requires all clients to implement a compiler from locator chains to JS. It also doesn't work well for traversing into shadow trees, since closed shadow roots aren't available even to JS running in a sandbox. InputWe currently model input as a custom batching, where we initially set up a set of input devices, and then run an action (maybe a noop) for each device in parallel, wait for all of them to complete (and the event loop to spin enough for events generated to be dispatched) and then move on to the next action for each device. However we sometimes get requests to inline other behaviour into the actions e.g. running some script to RequirementsRun initial setup (setting up devices, etc.). Run actions for each "tick" in parallel and wait for all to complete. Then wait for the event loop to reach a marker. Then run another set of events in parallel. No return value required. AlternativesAdd Do nothing, and require clients to roundtrip when they want to interleave actions with other commands. Prior Artcaptnproto promise pipelines |
@jgraham Thanks for writing this up. It's articulate in a way I could not be. It sounds like to support all of the cases enumerated above, we would want a data structure that enables a chain of commands. Think a list or array, like: [
{
"method": "first.command",
"params": {
"command1", "parameters"
}
},
{
"method": "second.command",
"params": {
"command2", "parameters"
}
}
] with some mechanism to pass the results of commands around. For the sake of discussion, let's call this a At the same time, to support some of the use cases (Inputs and Initial Setup, we need a data structure that allows parallel So we'd need an "array of arrays", like: [
[
{
"method": "first.command",
"params": {
"command1", "parameters"
}
},
{
"method": "second.command",
"params": {
"command2", "parameters"
}
}
],
[
{
"method": "third.command",
"params": {
"command3", "parameters"
}
},
{
"method": "fourth.command",
"params": {
"command4", "parameters"
}
}
]
] Looking back at the above, I realize this is deeply similar to what we've already got in the I'm not sure what the sync point concept looks like for cross- I also don't know if it makes sense to implement a retry mechanism, where a As an aside, clients should already be be implementing some sort of timeout for when a command doesn't complete before some interval, but we have no general mechanism in the protocol for what happens when a client abandons waiting for an incomplete long-running command, so I'm less concerned about that. I hasten to point out that there's another design here where we fire a So next steps in designing the data structure for the payload would be: (1) what do the sync points between parallel streams look like, and (2) what does the structure look like for passing results to subsequent commands in a serial stream, and consuming properties of the results in those subsequent commands. I don't have good ideas on what that looks like yet. Please help. |
One more comment: It might make sense to leave the input batching alone as encapsulated in a single command. I could see one wanting to do something like: [
{
"method": "browsingContext.locateNodes"
},
{
"method": "input.performActions"
}
] where the input method performs actions based on the node located. If the Input case is deconstructed so that parallel |
In terms of solutions, I think you can get a long way there by treating each command as a promise and providing analogues of the usual promise combinators. So running multiple commands in parallel would be like Then then question is about passing data between commands. An initial cut might use So a use case like chaining locators (using a kind of pseudocode rather than writing out the payloads in full) would be: command.chain(commands=[
browsingContext.locateNodes(type="css", context=[someNodeReference], selector="h1,h2,h3,h4,h5,h6"),
browsingContext.locateNodes(type="text", context=$0.nodes, selector="Heading")
script.callFunction(functionDeclaration=elems => elems.map(elem => elem.innerText), args=[$0.nodes], sandbox="client-code")
]);
// Response is just the result of the final command Running commands in parallel would be command.all(commands=[
session.subscribe(events=["log.entryAdded", "network.beforeRequestSent"]),
script.addPreloadScript(functionDeclaration=() => {/*some long script*/}, sandbox="client-code"),
network.addIntercept(phases=["beforeRequestSent"])
])
// Response is an array with the results of all commands And decompsed actions would be a combination of both (using really made-up commands): command.chain([
// Set up the input devices
command.all([
actions.addInput(type="pointer, name="pointer-1"),
actions.addInput(type="key", name="key-1")
]),
// Now for some reason we want to run some script
script.callFunction(functionDeclaration=elem => scrollIntoView(elem), args[elem])
// Now do the first tick
actions.pointerMove(input="pointer-1", origin="viewport", x=10, y=10, duration=100),
// Probably here we need actions.waitForEvents or something, but for now move on to the second tick
command.all([
actions.pointerDown(input="pointer-1"),
actions.keyDown(input="key-1", key=ctrl)
])
// Tick 3
locator.locateNode(type="css", selector="#target"),
actions.pointerMove(input="pointer-1", origin=$0.nodes[0], x=20, y=20)
// Tick 4
command.all([
actions.pointerUp(input="pointer-1"),
actions.keyUp(input="key-1", key=ctrl)
])
]) That has a couple of limitations. In particular in the first instance it doesn't allow storing values other than the previous output. So one couldn't do command.chain([
command.all([
browsingContext.locateNodes(type="css", selector=".foo")
browsingContext.locateNodes(type="text", selector="example)
]),
script.callFunction((nodes1, nodes2) => Array.from(nodes1).filter(elem => Array.from(nodes2).contains(elem)), args=[$0.results[0].nodes, $0.results[1].nodes)
]); But you can't short-circuit because there's no conditional evaluation i.e. you can't easily do "find all the nodes matching one selector, or if there aren't any all the nodes matching another selector" without always running both selectors. |
@jgraham As an aside for locators and proposing |
Right, maybe there's no use case for short-circuiting :) I did indeed think about it because of the term |
Oh, one wrinkle I just thought of is what happens if you have an array of nodes and want to get a property that's in the serialized form, but not available through the DOM object. If |
Which was the motivation for the |
Yes, the use case for that was clear :) Of course one can build a specific solution for transforming specific objects into specific other objects (e.g. list of So the general requirement here is something like: Given a command that returns an array of objects, we need to be able to extract properties from each item in the array, discarding any for which the property doesn't exist, or is null, and pass it on as the input to a subsequent command. So there's more or less two options I can think of. Option 1 is that one defines something at the data flow layer to implement command.chain(commands=[
browsingContext.locateNodes(type="css", elector=".shadowContainer"),
browsingContext.locateNodes(type="css", context=filter_map($0.nodes, item => item.value.shadowRoot), selector=".targetInShadowTree")
]); This has the semantics that if the result is null, or any accessed property doesn't exist the item is skipped. Option 2 is that we have an explicit way to convert an argument to/from raw protocol values and use script for manipulation e.g. something like: command.chain(commands=[
browsingContext.locateNodes(type="css", elector=".shadowContainer"),
script.callFunction(functionDeclaration=rawNodes => rawNodes.map(node => node.value?.shadowRoot).filter(node => node).map(node => deserialize(node)), args=[serialize($0.nodes)]),
browsingContext.locateNodes(type="css", context=$0, selector=".targetInShadowTree")
]); Where the I don't love either of these options, but at least it's a starting point for discussions :) |
I'm not sure I fully understand why we'd need What would happen if we introduced a flavor of This is probably an idea that has zero merit, but I'm trying to get my head around how to make this all work in the general case. |
{
"type": "node",
"sharedId": "node id",
"value": {
shadowRoot: {
"type": node,
"sharedId: "shadow root id",
// Other shadow root properties
}
}
// Other node properties
} Then we extract the shadow root and this gives us a JS object like: {
"type": node,
"sharedId: "shadow root id",
// Other shadow root properties
} The problem is on return we serialize the object and so get something like: {
"type": "object",
value: [
["type", {"type": "string", "value": "node"}],
["sharedId", {"type": "string", "value": "shadow root id"}]
]
} (or something like that, I did it from memory so probably got the format slightly wrong). I think you're right that a variant of |
It occurs to me that if we implement this, then sending a single command for a single response becomes a special case of the general batching case where |
I've been doing some thinking about this, and I've come up with a design proposal that might work, at least for sequences or chains of commands. Having commands run in parallel is not something I've got a ton of interest in, unless there's some sort of synchronization mechanism for having multiple chains run and sync up their execution at some point (a la the pause command in action sequences). So let me write a little pseudo-CDDL that would tie into existing definitions in the spec. A couple of caveats: I'm writing this out so it's documented somewhere besides just in my head. I'm sure this design will be deficient in some ineffable way, and unacceptable to at least some who'd be implementing it. Also, before I put this potential design out there, let me state once again, and in no uncertain terms, that I am not going to author the PR that attempts to implement this feature. I want the feature very, very badly, but I don't have the emotional energy to fight tooth and nail for every single detail. Here is what I had in mind: TypesThe session.CommandChainDataStorageValue Type
The session.ChainedCommandExecution Type
The session.ExecuteCommandChain CommandCommand Type
Return Type ResultData The basic algorithm for processing this would be as follows:
There are some things to note here. First, there is admittedly a ton of error handling missing in the above. This is intended as a skeleton, not a fully fleshed-out spec proposal or PR. Second, I'm quite certain that the phrasing in the above algorithm isn't right for direct copy-paste into a spec PR. Third, yes, I'm quite aware of the limitations imposed by executing a function definition without specifying a realm and so on; I will leave that to the spec PR author to work out those details. Fourth, I realize I'm probably shouting at the rain for this feature, but I really do believe it would unlock a lot of scenarios by minimizing round trips from a local end to a remote end. |
Not sure what the proper way is to provide feedback, but just noting as a user that is trying to reduce latency of their tests, I would like to:
All in one batch. I assume everything except assertion checking would be planned, but this could avoid many back-and-forths and express essentially my entire test. |
Supporting cases where the client has significant latency to the browser is an expressed goal of WebDriver BiDi.
Generally supporting this requires minimising network roundtrips, as each roundtrip incurrs an overhead of twice the latency. This can be particularly important for timing-critical operations where inserting a roundtrip may affect the outcome of the test (e.g. because something changed in the page under test during the network delay).
Generally the solution here is to allow generating a set of commands upfront and sending them together, then getting a single result for the group of commands, rather than one result for each command.
There are many different possible ways to do this, with different tradeoffs. For example JSON-RPC has a simple batching mechanism that allows sending multiple commands and then getting the results of all the commands in a single response. However we intentionally didn't adopt JSON-RPC specifically because this feature didn't seem to hold its weight in terms of the functionality it enables compared to the cost.
This issue is to explore use cases and requirements for command batching, and explore possible designs.
Original comment
A common part of the workflow for starting a WebDriver-BiDi session is performing initial configuration. This can involve various things e.g.: * Configuring global browser settings (e.g. proxy configuration, prefs, window size) * Subscribing to certain events * Installing preload scripts * Adding network request intercepts * Configuring BiDi settings (there isn't much of this so far, but in the future maybe e.g. setting log level filters)Currently some of this is done in capabilities, and some of it is done in individual commands that all need to be run once the session is configured.
Ideally we wouldn't use capabilities for anything that happens after the browser is launched. They have this complex matching behaviour that doesn't really make sense for most configuration use cases. In particular we could aim to end up in a place where by the time you're connecting to an actual browser the capabilities have all been processed and can be ignored entirely (obviously a classic-style Driver is performing the role of a browser launcher, and so would consider configuration required at browser launch time).
(We also have "capabilities" which only exist in the return value; that's a rather different concern, although I note that we've failed to update the return value with new "optional" features).
Although removing existing capabilities might not be possible, I at least don't think we should add new post-launch configuration to capabilities.
So assuming we don't add all this configuration to capabilities directly it's currently up to the client to send all the configuration commands just after the session has started. That's OK, but it does add a bunch of roundtrip latency.
Early on there was some discussion about sending batches of commands. This seems like a clear case where that might be useful e.g. we give the
session.new
payload a field likecommands
which takes a list of commands. Thesession.new
response is delayed until after all those commands have been dispatched and the response gets an extra field containing the return value of each command (which might be an error).Alternatively we could have a more generic mechanism that doesn't special-case new session e.g. a
session.sendCommands
command whose payload would be a list of commands to run in-order and whose return value would be a list of responses to those commands.The text was updated successfully, but these errors were encountered: