Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renamed pointerOrigin and enum values to reflect usage pattern, not device type. #342

Merged
merged 3 commits into from
May 29, 2018

Conversation

leweaver
Copy link
Contributor

@leweaver leweaver commented Apr 16, 2018

The main purpose of this PR is to clarify what input sources are returned from a call to XRSession.getInputSources(). These changes make it easier to expose tracked hands on AR devices are exposed (eg, the HoloLens).

I've renamed the pointerOrigin enum and members to targetRayMode: { 'gazing', 'pointing', 'tapping' } to better reflect the input intent, rather than physical representation. This was primarily to reduce confusion: since a tracked hand is actually uses the head gaze as a ray, so should be a head not a hand in the old nomenclature.

The implication of this is that while it is up to the user-agent to determine if it is appropriate to merge multiple 0DOF input sources (clicker, gamepad etc.) into a single gazing input source, or expose them as multiple (eg, tracked hands that only have a position but not orientation - thus are still considered gazing),
I've removed the responsibility of the user agent for actually choosing which XRInputSource's are 'active'. All available input sources are returned.

I've also adopted the as-yet unfinalized raycast origin/target, which is under discussion in #339 and immersive-web/hit-test#14. Will need to likely update this when those discussions are resolved.

Lastly - the controller rendering samples here still depend on the outcome of open issues: #336 and #330

Lewis Weaver added 2 commits April 16, 2018 14:57
…ction vectors.

pointerOrigin is now named 'targetRayMode' and can have values to { 'gazing', 'pointing', 'tapping' }.
pointerPose is now a ray consisting of two properties: 'targetRayOrigin' and 'targetRayDirection'.

This is to be in line with immersive-web/hit-test#8 until a final approach in immersive-web#339 is determined.
@leweaver
Copy link
Contributor Author

Since the raycast discussion is as yet unresolved, I'm changing targetRay back to a matrix since the primary point of this change is not semantics of raycasting, but the naming of pointerOrigin.

@leweaver leweaver changed the title Renamed pointerOrigin and enum values. Updated ray to use vectors Renamed pointerOrigin and enum values to reflect usage pattern, not device type. May 17, 2018
@@ -415,17 +415,21 @@ xrSession.addEventListener('inputsourceschange', (ev) => {
});
```

The properties of an XRInputSource object are immutable. If a device can be manipulated in such a way that these properties can change, the `XRInputSource` will be removed and recreated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to think about it for a while to decide if I'm OK with this, but I think I am. The most problematic scenario that I can think of is if you have a system that is not explicitly handed (like the Vive wands) and the system decides that you've switched hands in the middle of a grab-and-drag gesture it would be common for the page implementation to cause the object to be dropped as the "hand" holding it disappears.

The reality is, though, that I rarely see this happen outside of early system initialization and if it's a large concern then the system can silently re-map the physical devices to the previously created left/right hand input sources to hide the transition. Plus there's a significant number of devices for which the handedness is an explicit property of the input device or only changed manually, and therefore will never be surprising to the user.

So TL;DR: I'm good with it!

Copy link
Contributor

@thetuvix thetuvix May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is precisely since this is rare that we believe it is safer to remove and re-add the sources - since most apps will experience effectively unchanging sources, they'll likely assume they don't change and would probably miss such changes. Removing and re-adding the source gets us back onto the mainline path such apps will have tested.

explainer.md Outdated
Each input source can queried a `XRInputPose` using the `getInputPose()` function of any `XRPresentationFrame`. Getting the pose requires passing in the `XRInputSource` you want the pose for, as well as the `XRCoordinateSystem` the pose values should be given in, just like `getDevicePose()`. Similar to `getDevicePose()` the requested pose may return `null` in cases where tracking has been lost.
Each input source can query a `XRInputPose` using the `getInputPose()` function of any `XRPresentationFrame`. Getting the pose requires passing in the `XRInputSource` you want the pose for, as well as the `XRCoordinateSystem` the pose values should be given in, just like `getDevicePose()`. `getInputPose()` may return `null` in cases where tracking has been lost (similar to `getDevicePose()`), or the given `XRInputSource` instance is no longer connected or available.

If an input source can be tracked the `XRInputPose`'s `gripMatrix` will indicate the device's position and orientation. If only position or orientation is trackable (not both), the `gripMatrix` will return a transform matrix applying only the trackable pose. An example of this case is for physical hands on some AR devices, that only have a tracked position. The `gripMatrix` will be `null` if the input source isn't trackable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If only position or orientation is trackable (not both), the gripMatrix will return a transform matrix applying only the trackable pose"

The wording on this is a little weird to me. I assume you mean that, for example, if the input source can only track position the transform matrix will not contain any rotation component? Not sure how to state that more clearly though.

Also, does this imply that the transform matrix should not apply an arm model for 3DoF controllers? Because if so I'm pretty strongly opposed to that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your interpretation is correct, but I'll have a think about wording to make this a bit clearer. However, it was not the intention to disallow arm modeling for 3DoF controllers; would changing "will return" -> "may return" be sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will return -> may return is good enough for me, especially since this change does preserve the explicit mention of arm models later on.

explainer.md Outdated

* `'head'` indicates the pointer ray will originate at the user's head and follow the direction they are looking. (This is commonly referred to as a "gaze input" device.) There should be at most one `'head'` input source per session.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lost the text indicating that there should be at most one head input source, which I think was originally requested by Microsoft? Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional since we are now treating AR hands as gazing sources that have a gripMatrix (whereas in the initial prototypes, we used hand pointerOrigin). Since there are multiple hands, there needs to be one distinct inputSource in the array for each.

* `'screen'` indicates that the input source was an interaction with the 2D canvas of a non-exclusive session, such as a mouse click or touch event. See [Magic Window Input](#magic_window_input) for more details.
* `'gazing'` indicates the target ray will originate at the user's head and follow the direction they are looking (this is commonly referred to as a "gaze input" device). While it may be possible for these devices to be tracked (and have a grip matrix), the head gaze is used for targeting. Example devices: 0DOF clicker, regular gamepad, voice command, tracked hands.
* `'pointing'` indicates that the target ray originates from a handheld device and represents that the user is using that device for pointing. The exact orientation of the ray relative to the device should follow platform-specific guidelines if there are any. In the absence of platform-specific guidance, the target ray should most likely point in the same direction as the user's index finger if it was outstretched while holding the controller.
* `'tapping'` indicates that the input source was an interaction with the 2D canvas of a non-exclusive session, such as a mouse click or touch event. See [Magic Window Input](#magic_window_input) for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the term tapping feels accurate here. People typically associate it with touchscreens, and we want it to cover mouse and stylus input as well. Admittedly I don't have a better term in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one to name; some other ideas:

  • projecting - as in, the pointing ray is projected into the scene from the camera as defined by a point on the near plane
  • touching - mouse, touch, stylus, etc could maybe considered forms of touch?
  • pressing - same reasoning as touching

Copy link
Member

@toji toji May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projecting is probably the most accurate, but it doesn't seem intuitive. Maybe screen-projection or something like that?

While it's not exactly more accurate than tapping, the term touching feels more comfortable for whatever reason. At the very least tapping sounds like an instantaneous event while touching implicitly includes drag or long press operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screen-projection seems to be the clearest, but touching feels more in line with the other enum values and is my preference by a narrow margin. Thoughts from other vendors welcome!

explainer.md Outdated
}
```

### Complex input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Complex input" is maybe not the phrase I would use here. Maybe "Direct manipulation" or "Grabbing and dragging"?

@toji
Copy link
Member

toji commented May 17, 2018

I've left a few comments, but in general I feel like this is a good clarification of the input system. Thanks for putting the time into it!

Updated section title and clarified gripMatrix wording
@toji
Copy link
Member

toji commented May 24, 2018

I'm happy with this now, and willing to merge it. (Still wish I knew a better term than 'tapping', but I don't) I'd like to get input from at least one other vendor prior to doing so, however. Anyone else want to chime in? (If we don't hear back prior to the next call I'll bring it up then and try to get consensus.

@leweaver
Copy link
Contributor Author

The code in the samples directory will also need to be modified to handle the new values. Considering that the Ray updates are potentially on the way, does it make sense to update the samples in one go as a separate PR?

@toji
Copy link
Member

toji commented May 24, 2018

Yes, I think the samples can be updated in a more bundled fashion. We'll also want to add some simple, temporary shims to ensure that the new names map correctly to existing implementations (like Chrome) which will take a couple of browser releases to cycle to the new verbiage.

@toji toji merged commit ffc52ba into immersive-web:master May 29, 2018
aarongable pushed a commit to chromium/chromium that referenced this pull request Jul 5, 2018
The enums and some of the attributes involved in the WebXR input system
were changed recently[1].

[1] immersive-web/webxr#342

Bug: 854382
Change-Id: I56fe5909d7015461cb7314d23a12b194f148d483
Reviewed-on: https://chromium-review.googlesource.com/1112881
Commit-Queue: Byoungkwon Ko <codeimpl@gmail.com>
Reviewed-by: Brandon Jones <bajones@chromium.org>
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Cr-Commit-Position: refs/heads/master@{#572756}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants