Renamed pointerOrigin and enum values to reflect usage pattern, not device type. #342

leweaver · 2018-04-16T22:23:40Z

The main purpose of this PR is to clarify what input sources are returned from a call to XRSession.getInputSources(). These changes make it easier to expose tracked hands on AR devices are exposed (eg, the HoloLens).

I've renamed the pointerOrigin enum and members to targetRayMode: { 'gazing', 'pointing', 'tapping' } to better reflect the input intent, rather than physical representation. This was primarily to reduce confusion: since a tracked hand is actually uses the head gaze as a ray, so should be a head not a hand in the old nomenclature.

The implication of this is that while it is up to the user-agent to determine if it is appropriate to merge multiple 0DOF input sources (clicker, gamepad etc.) into a single gazing input source, or expose them as multiple (eg, tracked hands that only have a position but not orientation - thus are still considered gazing),
I've removed the responsibility of the user agent for actually choosing which XRInputSource's are 'active'. All available input sources are returned.

~~I've also adopted the as-yet unfinalized raycast origin/target, which is under discussion in #339 and immersive-web/hit-test#14. Will need to likely update this when those discussions are resolved.~~

Lastly - the controller rendering samples here still depend on the outcome of open issues: #336 and #330

…ction vectors. pointerOrigin is now named 'targetRayMode' and can have values to { 'gazing', 'pointing', 'tapping' }. pointerPose is now a ray consisting of two properties: 'targetRayOrigin' and 'targetRayDirection'. This is to be in line with immersive-web/hit-test#8 until a final approach in immersive-web#339 is determined.

leweaver · 2018-05-17T17:40:56Z

Since the raycast discussion is as yet unresolved, I'm changing targetRay back to a matrix since the primary point of this change is not semantics of raycasting, but the naming of pointerOrigin.

toji · 2018-05-17T21:20:25Z

explainer.md

@@ -415,17 +415,21 @@ xrSession.addEventListener('inputsourceschange', (ev) => {
 });
 ```

+The properties of an XRInputSource object are immutable. If a device can be manipulated in such a way that these properties can change, the `XRInputSource` will be removed and recreated.


I had to think about it for a while to decide if I'm OK with this, but I think I am. The most problematic scenario that I can think of is if you have a system that is not explicitly handed (like the Vive wands) and the system decides that you've switched hands in the middle of a grab-and-drag gesture it would be common for the page implementation to cause the object to be dropped as the "hand" holding it disappears.

The reality is, though, that I rarely see this happen outside of early system initialization and if it's a large concern then the system can silently re-map the physical devices to the previously created left/right hand input sources to hide the transition. Plus there's a significant number of devices for which the handedness is an explicit property of the input device or only changed manually, and therefore will never be surprising to the user.

So TL;DR: I'm good with it!

It is precisely since this is rare that we believe it is safer to remove and re-add the sources - since most apps will experience effectively unchanging sources, they'll likely assume they don't change and would probably miss such changes. Removing and re-adding the source gets us back onto the mainline path such apps will have tested.

toji · 2018-05-17T21:34:46Z

explainer.md

-Each input source can queried a `XRInputPose` using the `getInputPose()` function of any `XRPresentationFrame`. Getting the pose requires passing in the `XRInputSource` you want the pose for, as well as the `XRCoordinateSystem` the pose values should be given in, just like `getDevicePose()`. Similar to `getDevicePose()` the requested pose may return `null` in cases where tracking has been lost.
+Each input source can query a `XRInputPose` using the `getInputPose()` function of any `XRPresentationFrame`. Getting the pose requires passing in the `XRInputSource` you want the pose for, as well as the `XRCoordinateSystem` the pose values should be given in, just like `getDevicePose()`. `getInputPose()` may return `null` in cases where tracking has been lost (similar to `getDevicePose()`), or the given `XRInputSource` instance is no longer connected or available.
+
+If an input source can be tracked the `XRInputPose`'s `gripMatrix` will indicate the device's position and orientation. If only position or orientation is trackable (not both), the `gripMatrix` will return a transform matrix applying only the trackable pose. An example of this case is for physical hands on some AR devices, that only have a tracked position. The `gripMatrix` will be `null` if the input source isn't trackable. 


"If only position or orientation is trackable (not both), the gripMatrix will return a transform matrix applying only the trackable pose"

The wording on this is a little weird to me. I assume you mean that, for example, if the input source can only track position the transform matrix will not contain any rotation component? Not sure how to state that more clearly though.

Also, does this imply that the transform matrix should not apply an arm model for 3DoF controllers? Because if so I'm pretty strongly opposed to that.

Your interpretation is correct, but I'll have a think about wording to make this a bit clearer. However, it was not the intention to disallow arm modeling for 3DoF controllers; would changing "will return" -> "may return" be sufficient?

Will return -> may return is good enough for me, especially since this change does preserve the explicit mention of arm models later on.

toji · 2018-05-17T21:38:17Z

explainer.md


-  * `'head'` indicates the pointer ray will originate at the user's head and follow the direction they are looking. (This is commonly referred to as a "gaze input" device.) There should be at most one `'head'` input source per session.


We lost the text indicating that there should be at most one head input source, which I think was originally requested by Microsoft? Is that intentional?

This was intentional since we are now treating AR hands as gazing sources that have a gripMatrix (whereas in the initial prototypes, we used hand pointerOrigin). Since there are multiple hands, there needs to be one distinct inputSource in the array for each.

toji · 2018-05-17T21:41:47Z

explainer.md

-  * `'screen'` indicates that the input source was an interaction with the 2D canvas of a non-exclusive session, such as a mouse click or touch event. See [Magic Window Input](#magic_window_input) for more details.
+  * `'gazing'` indicates the target ray will originate at the user's head and follow the direction they are looking (this is commonly referred to as a "gaze input" device). While it may be possible for these devices to be tracked (and have a grip matrix), the head gaze is used for targeting. Example devices: 0DOF clicker, regular gamepad, voice command, tracked hands.
+  * `'pointing'` indicates that the target ray originates from a handheld device and represents that the user is using that device for pointing. The exact orientation of the ray relative to the device should follow platform-specific guidelines if there are any. In the absence of platform-specific guidance, the target ray should most likely point in the same direction as the user's index finger if it was outstretched while holding the controller.
+  * `'tapping'` indicates that the input source was an interaction with the 2D canvas of a non-exclusive session, such as a mouse click or touch event. See [Magic Window Input](#magic_window_input) for more details.


Not sure if the term tapping feels accurate here. People typically associate it with touchscreens, and we want it to cover mouse and stylus input as well. Admittedly I don't have a better term in mind.

This is a tough one to name; some other ideas:

projecting - as in, the pointing ray is projected into the scene from the camera as defined by a point on the near plane

touching - mouse, touch, stylus, etc could maybe considered forms of touch?

pressing - same reasoning as touching

projecting is probably the most accurate, but it doesn't seem intuitive. Maybe screen-projection or something like that?

While it's not exactly more accurate than tapping, the term touching feels more comfortable for whatever reason. At the very least tapping sounds like an instantaneous event while touching implicitly includes drag or long press operations.

screen-projection seems to be the clearest, but touching feels more in line with the other enum values and is my preference by a narrow margin. Thoughts from other vendors welcome!

toji · 2018-05-17T22:01:11Z

explainer.md

+}
+```
+
+### Complex input


"Complex input" is maybe not the phrase I would use here. Maybe "Direct manipulation" or "Grabbing and dragging"?

toji · 2018-05-17T22:03:37Z

I've left a few comments, but in general I feel like this is a good clarification of the input system. Thanks for putting the time into it!

Updated section title and clarified gripMatrix wording

toji · 2018-05-24T00:24:53Z

I'm happy with this now, and willing to merge it. (Still wish I knew a better term than 'tapping', but I don't) I'd like to get input from at least one other vendor prior to doing so, however. Anyone else want to chime in? (If we don't hear back prior to the next call I'll bring it up then and try to get consensus.

leweaver · 2018-05-24T18:21:47Z

The code in the samples directory will also need to be modified to handle the new values. Considering that the Ray updates are potentially on the way, does it make sense to update the samples in one go as a separate PR?

toji · 2018-05-24T19:48:50Z

Yes, I think the samples can be updated in a more bundled fashion. We'll also want to add some simple, temporary shims to ensure that the new names map correctly to existing implementations (like Chrome) which will take a couple of browser releases to cycle to the new verbiage.

The enums and some of the attributes involved in the WebXR input system were changed recently[1]. [1] immersive-web/webxr#342 Bug: 854382 Change-Id: I56fe5909d7015461cb7314d23a12b194f148d483 Reviewed-on: https://chromium-review.googlesource.com/1112881 Commit-Queue: Byoungkwon Ko <codeimpl@gmail.com> Reviewed-by: Brandon Jones <bajones@chromium.org> Reviewed-by: Kinuko Yasuda <kinuko@chromium.org> Cr-Commit-Position: refs/heads/master@{#572756}

Lewis Weaver added 2 commits April 16, 2018 14:57

Change inputPose.targetRay back to a martix.

ab606c1

leweaver changed the title ~~Renamed pointerOrigin and enum values. Updated ray to use vectors~~ Renamed pointerOrigin and enum values to reflect usage pattern, not device type. May 17, 2018

toji reviewed May 17, 2018

View reviewed changes

CR feedback

d4d941a

Updated section title and clarified gripMatrix wording

toji mentioned this pull request May 24, 2018

Created an explicit XRRay type #357

Merged

toji merged commit ffc52ba into immersive-web:master May 29, 2018

jsantell mentioned this pull request May 30, 2018

Rename pointerOrigin and enum immersive-web/webxr-polyfill#13

Closed

toji mentioned this pull request Jul 11, 2018

Change tense of XRTargetRayMode enums #379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renamed pointerOrigin and enum values to reflect usage pattern, not device type. #342

Renamed pointerOrigin and enum values to reflect usage pattern, not device type. #342

leweaver commented Apr 16, 2018 •

edited

Loading

leweaver commented May 17, 2018

toji May 17, 2018

thetuvix May 29, 2018 •

edited

Loading

toji May 17, 2018

ransico May 17, 2018

toji May 18, 2018

toji May 17, 2018

leweaver May 21, 2018

toji May 17, 2018

leweaver May 24, 2018

toji May 24, 2018 •

edited

Loading

leweaver May 24, 2018

toji May 17, 2018

toji commented May 17, 2018

toji commented May 24, 2018

leweaver commented May 24, 2018

toji commented May 24, 2018


		* `'head'` indicates the pointer ray will originate at the user's head and follow the direction they are looking. (This is commonly referred to as a "gaze input" device.) There should be at most one `'head'` input source per session.

Renamed pointerOrigin and enum values to reflect usage pattern, not device type. #342

Renamed pointerOrigin and enum values to reflect usage pattern, not device type. #342

Conversation

leweaver commented Apr 16, 2018 • edited Loading

leweaver commented May 17, 2018

Choose a reason for hiding this comment

thetuvix May 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toji May 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toji commented May 17, 2018

toji commented May 24, 2018

leweaver commented May 24, 2018

toji commented May 24, 2018

leweaver commented Apr 16, 2018 •

edited

Loading

thetuvix May 29, 2018 •

edited

Loading

toji May 24, 2018 •

edited

Loading