Add endpoint that receives a webdriver script for batch command execution #328

jlipps · 2019-06-04T02:30:38Z

The batch command proposal has been under discussion for some time. To summarize, the original proposal was for a new API that would essentially allow encoding of any Appium command in JSON format, including the use of magic values like {{<element}} to refer to an element found in a previous command. This original proposal had several drawbacks:

Quite a lot of code would need to be written on both client and server to support the feature. Most of the code would just be encoding/decoding commands from a JSON format.
This method interacts poorly with proxying, which is quite common in Appium tests. Say that a batched command set came in involving a context switch, such that subsequent commands in the batch are supposed to target chromedriver. The problem is that the batch command itself would be written on the driver, which can only awkwardly proxy it on.
Use of backreferences is just one example of all kinds of logic that users might wish to encode in batch commands, and be unable to. How would we handle conditionals, loops, and the like? We'd need to draw a line somewhere or invent a Turing complete mini language to support all the use cases!

This PR takes a much more radical approach. Basically, it asks that the user send in not a formatted list of commands, but rather a script (written in a supported library, in this case webdriverio). This script is run in a NodeJS VM with access to a driver object representing the current session. This proposal has a number of advantages:

Access to a full webdriver client
Access to a full programming language (JS) for use of logic
Very little code needing to be written on the client side (the Appium Java client would just assume someone will write a webdriverio script as a string. Or, the Java client could go crazy and implement a code transformation feature that outputs valid webdriverio code. Either way).
Very little code needing to be written on the server side. I think this PR is all that's necessary!

The main disadvantages of this new proposal are:

Theoretical security concerns running untrusted code. However, the code is sandboxed in a VM, and has no access to e.g. the require function. I'm having trouble thinking how a malicious user could abuse this feature, but it's possible I suppose.
We bring in a webdriver client (webdriverio) as a production dep. Thankfully it's not too big. In the future, this feature and its deps could be developed as a plugin/extension rather than part of Appium core.

I think this is pretty sweet, what do you all think?

lib/basedriver/commands/execute.js

test/basedriver/driver-e2e-tests.js

lib/basedriver/commands/execute.js

xcolwell · 2019-06-05T03:00:29Z

This is very nice direction for remote device testing and distributed device testing. Having a way to optimize performance critical sections like "find and act" while running a full script on the client has not been possible with Appium.

A couple questions:

how are remote exceptions returned to the client?
how are stdout/stderr returned to the client?
can the script return a json to the client?
is there a watchdog on the uploaded script for example max cpu time per script, max elapsed time per script?
do you see a standard set of available modules in the node vm? In other words is there a standard "header" file that gets appended to each batch script?
probably this would be enabled with a special flag? I think the semantics of --relaxed-security make sense here, but it would be most useful for us to enable just this batch feature without the other features guarded by this flag.

Also I think it would be useful to pass the script language with the script, so that cloud providers can choose to optionally support transpilation of other languages to javascript.

KazuCocoa · 2019-06-05T05:59:07Z

a note after reading https://github.com/appium/appium-base-driver/pull/328/files#diff-aa816d90f6389bb3ff7a3fc331643232R287 ...

an idea of how to pass valid JS commands to the server. (To reduce mistake/work load in client side)

like Selenium, we'll prepare some JS commands as atom commands. Each client gets, reads and sends it to server like https://github.com/SeleniumHQ/selenium/blob/2abd80f236d1a7459ef638e96af8c4efd86b4abd/rb/lib/selenium/webdriver/atoms.rb#L30 (we can fix the JS file to fix some issues instead o updating all clients. This is a behefit of shared code.)

If sending command is complicated, this method does not work. Users should write JS (or will we implement each languages->JS converter?).

jlipps · 2019-06-06T18:54:18Z

@KazuCocoa I'm not totally sure I'm following you. Are you saying that it's a bad idea to give clients the ability to write full-on javascript? And instead we implement client methods that simply generate correct javascript?

I think that's a good idea, but I don't see them as mutually exclusive. You could either write something like this:

await driver.executeDriverScript(`
    const el = await driver.$("~foo");
    await el.click();
`);

or, you could write this (if the client supports it, let's say we're using python:

with driver.batch() as batch:
    el = batch.find_element_by_accessibility_id("foo")
    el.click()

And then the python client would, as part of interpreting the context manager there, turn that into valid webdriverio code underneath the hood, and call driver.execute_script on its own?

Is that what you're proposing? If so I agree that'd be awesome, and then it's just up to the clients to implement this "safe" way of using executeScript. In practice though, I'm not sure how easy it would be to do this translation. What about this script?

with driver.batch() as batch:
    els = batch.find_elements_by_accessibility_id("foo")
    for el in els:
        el.click()

How do you translate the for loop into JS? You'd need to do the translation based on a Python AST or something, which is getting kinda crazy at this point.

jlipps · 2019-06-06T18:58:40Z

thanks for the questions, @xcolwell!

how are remote exceptions returned to the client?

Same as with any other error

how are stdout/stderr returned to the client?

Right now they are not (and running inside a pure JS VM, there's no such thing as stdout/stderr per se), but we could make some log object available to the script, which could then be used to put the output in the appium log as well as return it as part of the return value of the function.

can the script return a json to the client?

Yes, the script can return any simple javascript object at any level of complexity, and JSON is the transport method.

is there a watchdog on the uploaded script for example max cpu time per script, max elapsed time per script?

No, I'm not sure if it's possible to attach a watchdog to a VM, but this could presumably be done of Appium as a whole. I'm assuming actions in the VM get counted on a system resource level are counted as happening within the appium process

do you see a standard set of available modules in the node vm? In other words is there a standard "header" file that gets appended to each batch script?

There could be! We'd just need to decide what we want to make available to the user. In this PR it's just the driver object. But we mentioned above maybe a log object also makes sense. What else makes sense? Within the constraints of security.

probably this would be enabled with a special flag? I think the semantics of --relaxed-security make sense here, but it would be most useful for us to enable just this batch feature without the other features guarded by this flag.

I agree.

jlipps · 2019-06-10T03:10:18Z

@imurchie any objections to this approach?

lib/basedriver/driver.js

…also includes all appium commands

jlipps · 2019-06-19T21:28:57Z

ok @mykola-mokhnach isFeatureEnabled has been updated, and unit tests added. i think this code is now all good to go. i'll be happy to find uses of relaxedSecurityEnabled in the codebase and update to use this new method.

jlipps · 2019-06-20T03:24:38Z

Alright, I've made all the downstream changes so once we merge all of these, the whole codebase should be using the new security model.

lib/basedriver/commands/execute.js

mykola-mokhnach

This features needs a separate documentation page

KazuCocoa

🎉 promising

…f necessary

jlipps · 2019-06-20T20:06:07Z

will work on docs in another PR in the main repo

jlipps · 2019-06-21T01:40:42Z

OK, docs added in appium/appium#12807

lib/basedriver/commands/execute-child.js

mykola-mokhnach · 2019-06-21T05:25:26Z

lib/basedriver/commands/execute-child.js

+const W3C_ELEMENT_KEY = 'element-6066-11e4-a52e-4f735466cecf';
+const MJSONWP_ELEMENT_KEY = 'ELEMENT';
+
+async function runScript (driverOpts, script, timeout) {


should timeout have a default value?

it does, in the important place where the request from the user is handled (in execute-child.js) but i'll add a check to make sure it's included

jlipps · 2019-06-21T17:08:58Z

Also just realized that to support this feature, /execute_driver has to be added to NO_PROXY for specific drivers.

jlipps · 2019-06-21T18:25:53Z

I've run some functional tests with @KazuCocoa 's python client changes, and they work (though I ran into a bug with webdriverio sendkeys, seems like these lines are causing the problem--cc @christian-bromann )

christian-bromann · 2019-06-23T18:31:39Z

though I ran into a bug with webdriverio sendkeys

I will look into it.

jlipps · 2019-06-24T16:29:59Z

will wait a bit to see what @christian-bromann says before merging, in case a code change is required on this end. otherwise i think this is good to go, and will merge/publish all the dependent PRs once this one is merged.

jlipps · 2019-06-25T19:11:55Z

ok, with the new version of webdriverio, i've confirmed this works, so will merge now!

jlipps · 2019-06-25T19:14:51Z

published in 3.17.0

jlipps requested review from dpgraham, imurchie, KazuCocoa and mykola-mokhnach June 4, 2019 02:30