-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit session creation options and flow #330
Comments
I actually think these various use cases can be addressed much more simply. I’m also worried about the session creation being overburdened with too many options. I do however think that it is very important to distinguish between AR and VR sessions, and that this is something each app should specify at session creation time. My proposal is that we leave the existing session creation parameters as-is, and simply add one more:
The rationale is that any XR app does one of two things at any point in time: create a virtual reality, or augment an existing reality. An app will request which one of these two things it wants to do with this session “type” parameter. For world aligned content, I think this can be a new frame of reference, or an option on the “stage” frame of reference, or a special anchor. If this requires special permissions, however, I can see how it might be more sensible to get it all in one go in the session creation parameters. For spatial tracking (3DOF vs 6DOF), I think this is already handled adequately in the current spec. For example, devices without 6DOF will fail to provide an “eyeLevel” frame of reference. 3DOF is provided by the “headModel” frame of reference. I don’t think this really matters at session creation time, as the application should do its best to use whatever capabilities are available for the session it’s given. For display modes, I think this should be made available on the XRSession (e.g. |
Thanks for writing up your thoughts on this, Blair! I have buckets full of opinions on this subject, and am struggling to convey them in a way that's not just a massive braindump. Thankfully @speigg already hit on a couple of my initial thoughts (world alignment seems better handled at FrameOfReference/Anchor creation time, same for 3DoF vs 6DoF.) I guess a good place to start is to try and establish a concrete reason why we should prefer to initialize something at session creation time vs. later. In an ideal world exposing AR and VR capabilities would be something that could happen entirely post-creation with zero overhead and perfect expressiveness. The Ultimate XR Device™ would be able to be completely opaque or transparent at a moment’s notice, could switch tracking capabilities on and off at a whim, and would only track things like environmental geometry when explicitly asked. The idealized API for such a device would look something like: xrDevice.requestSession().then((xrSession) => {
xrSession.transparent = true;
xrSession.positionalTracking = true;
xrSession.getPlanes().then((planes) => { … });
}); So it's worth examining what prevents us from having that API? In my opinion it's three things: Permissions, mutual capability exclusivity, and hardware limitations. Permissions: Exposing some capabilities may require showing a permission prompt to the user. We don’t ever want to automatically opt the developer into capabilities that will display a permissions prompt without the developer explicitly indicating that it was their intent to do so. (Even if a prompt is ultimately not produced.) This is probably the most flexible point, in that a great many items that might require permissions can still be handled post-create by utilizing a request->promise pattern that defers until a permission has been granted. There's something to be said for not spreading out the permissions prompts over N different calls, though. Avoiding permission fatigue is a very real and worthwhile goal. Mutual capability exclusivity: As an example, current Android devices can do phone AR or headset VR, but not both at once. This is enforced today by making the libraries that drive the capabilities separate, but even if it were not so, the realities of mobile performance would dictate that 6DoF tracking with a video passthrough prevents stereo rendering at high enough framerate for headset use. This is technically a temporary issue, but the availability of low-powered devices is not likely to go away short term and so this issue will persist. This means that some decisions must be made early in the XR app's lifetime about what set of capabilities need to be spun up. Hardware limitations: Devices like HoloLens or Meta 2 have displays that are permanently transparent and cannot display opaque content. Thus the same capability that requires opt-in and permissions on mobile is unavoidable and permissionless on those devices. This one is funny, because it means that something like AR is both a feature you request and a limitation you have to code around, depending on your applications context. In any case, I tend to feel that anything that directly affects one of the above categories is a candidate for handling at session creation time, and everything else should be deferred. Tying that back to Blair's list:
(Part 1 of N) |
@speigg @toji thanks. Some comments. One thing I should probably have taken more time to separate is "things we want to know about the session we've gotten", "things we would like to check if a device might support", and "things we're asking for when requesting a session." Following on the single example of "exclusive", I just lumped them all into "options for sessions query/creation" and "properties on the resulting session". I agree that we should have minimal options, and I agree with the categories or reasons you list @toji. I was thinking about permissions as one reason for having these as options ("we get one permission popup that ask for geo-orientation, permission to use the devices sensors, etc"). I was also thinking about device limitations: ARKit has "align the tracking coordinates with geospatial coordinates" as a initialization options, and this seems like the right place to put it since it's a session-level thing. To expand on geo-alignment: what this is asking is if the local coordinates can be geoaligned, not for access to geolocation, and the alignment of the local coordinates seems like a "lifetime of the session" kind of choice. Now, it may be that we could change it over time (e.g., nobody is supposed to be use local coordinates directly, but rather everything should be anchored), but surely an app knows if it's using geo data or not? But, geoalignment also seems purely additive, so we can probably move this over to a "proposal" repo. I really want this discussed. 3DOF vs 6DOF may be something you want to ask ("does this device support 6DOF?") and may be something we want to set as a property or otherwise be able to query about a session (since right now I see people doing hacks like "is the position always 0,0,0"?). But this may not be something we want folks to be able to explicitly request. I was actually imagining it more as a hint ("All I need is 3DOF, but 6DOF is fine"). In the end, it may be enough to just somehow notify the programmer that this is a property of the session they got, and not have it be an option. Finally, w.r.t. AR/VR and display modes: I'd tend to agree that requesting AR vs VR might be reasonable, and then having properties on the session to help you understand what you got might also be reasons. But, considering @toji's Android example: if I have ARCore and Daydream, how do I create a "VR" session? It could be ARCore-tracking with a VR magic window, or HMD Daydream. Who is deciding, how is this decision presented to the user? Perhaps we have two devices: a Daydream device and a Magic Window device? An app can do |
xrDevice.requestSession().then((xrSession) => {
xrSession.transparent = true;
xrSession.positionalTracking = true;
xrSession.getPlanes().then((planes) => { … });
}); @toji I like the capabilities of your hypothetical Ultimate XR Device™ :), however I would suggest an API that requires apps to be reactive, rather than allowing apps to assume explicit control over the state of the XR device. The problem is that if we give apps explicit control, it becomes harder to backtrack in the future and give some of that control back to the user/user-agent in order to allow for more complex (experimental) use cases, such as (!) multiple simultaneous applications which can conflict with one another if they each assume control over the device state. So, perhaps a minor difference, but I’d prefer to see a combination of hints and requests even if we had the Ultimate XR Device™: xrDevice.requestSession().then((xrSession) => {
// request a transparent layer for AR or an opaque layer for VR
xrSession.requestLayer(“transparent” or “opaque”).then((xrLayer) => { ... });
xrSession.requestDisplayMode(“handheld” or “headworn”).then(() => { ... });
xrSession.ondisplaymodechange = () => { ... };
// etc.... plus corresponding properties on XRSession
xrSession.hints.positionalTracking = true;
xrSession.getPlanes().then((planes) => { … });
}); Of course, since we don’t have the Ultimate XR Device™, there are some API quirks here, as you pointed out, such as the fact that on some devices a “transparent” layer (video-see-thru in this case) won’t work while the display mode is “headworn” (stereo in this case), while using positional tracking (6DOF). If it’s request/promise based, doing these things could result in a rejected promise of course (on devices where such capabilities are mutually exclusive), but it’s also possibly more difficult for a developer to understand why it might fail, or why positional tracking might suddenly stop working when the display mode changes to “headworn” (but only when the layer is “transparent”?!). Potentially very confusing. It would be nice if we could avoid the failure cases altogether. I personally don’t think something like “requestDisplayMode” is even necessary (I think it’s better if the UA controls the display mode exclusively), and leaving it out could simplify things. |
@speigg I agree with the reactive comments, but the need is more mundane and closer afield. I would drop
and instead say that the problem is that having explicit control of what kind of session, and assuming apps will always create UIs to control the session they want, is problematic for various reasons:
|
Good points. Given that it’s tied to session creation in the underlying API, I suppose there isn’t really anywhere else it can go. Or perhaps we make the coordinate system geoaligned automatically, whenever the underlying platform supports it? Then we can just have a |
I very much agree with this, and think we definitely want to encourage reactive development across the board as much as possible. But there's probably a line to be drawn here. For example, I'm the sort of gung-ho pie-in-the-sky optimist that I'd like to just assume that the developer can say "give me whatever you've got" and we could alternately return a VR device with 6DoF input or a phone with passthrough and tap-to-interact input only OR a zSpace-like desktop and the web app will happily feature detect it's way to a working state once the session is spun up. And in a many cases I think that you can actually do something reasonable across that entire spectrum. But the reality of web development is that people will actually want to establish their own baselines for these things, with the all important decision being "do I advertise this feature or not?" (Let's put aside the differentiation between a button the developer adds to the page and a button hosted in the UA. In both case the developer will have to make a decision about wether or not the page should support XR content on the device in question.) Let's take an AR app for visualizing underground pipes so you don't dig into them during construction. This app has no value in VR. This app has no value (and in many ways negative value) if you can't properly align the visualization the world. Upon visiting that page, even with an otherwise XR capable device, if the developer finds that those capabilities are missing they probably want to show a "Your device is not compatible" message rather than a button which tries to spin up an XR session, then feature detects, then kicks you back out of XR and says "Oops! Turns out your device just can't do it. Sorry!" So this get to Blair's "things we would like to check if a device might support", and I think we can all agree that there a delicate balance to be struck. On the one hand, we want to support interesting use cases like the above scenario, so we can't make our "supports" calls too high-level or it's useless outside of the simplest cases. On the other hand, we don't want to encourage developer behavior of "If you don't have a 9DoF system with tactile simulation and neural interfacing then you simply don't deserve to see this content." I'll be the first to admit that I don't know where to draw that line, but I think that our platform being the web dictates that we start out conservative in what we expose. Both because it's easier to add API surface to the web than to remove it, and because every bit we do add is fingerprintable. I'd also suggest that with a minor modification the For example: let sessionOptions = {
type: "ar",
outputContext: context,
};
xrDevice.supportsSession().then((features) => {
if (features.worldAligned) {
//xrDevice supports the given sessionOptions, and the resulting session will provide
// the required features. Advertise XR content.
addButtonAndTellTheUAWeAreXRReady();
} else {
//xrDevice supports the given sessionOptions, but won't provide a required feature.
// Don't advertise XR content.
}
}).catch() {
// xrDevice doesn't support the given sessionOptions at all.
// Don't advertise XR content.
} |
React vs Request is an unclear line, I totally agree. I think we need both. In Argon4/argon.js, we opted to be as reactive as possible, and to have developers express their preferences. You'd initialize, and eventually be handed a session, but we left the specific session up to the UA. This did make it a bit frustrating sometimes. But, in the end, I think we'll have to deal with that here to some degree, since you might not get what you request, based on what the UA does in response to user permissions requests. Following on your example, @toji, I like the idea of expanding I'm not sure if I prefer if we return a template of the session, or if we have a set of options that are usable in The key is that we decide on the major features to include in include The "very nice thing" about this approach is that this naturally extends to different UAs exposing platform specific things. The WebXR version of Argon that @speigg is working on could include At the same time, I would like to continue thinking about how we combine this with reactive elements. I'm less concerned about the overconstrained app that really really only works in one specific case (I need world alignment AND custom computer vision on an AR display, and nothing else; I need 6DOF VR with a room area of at least 6' x 6'). I'm more concerned with underconstrained mass market apps that want to try and do something everywhere. What will their UI be expected to look like? Do they need to query every Or, do we want them to be able to check if at least ONE thing they support is active, create one button, and then when the user presses the button, the permissions dialog (much like the camera dialog in WebRTC) lets them select the Or, do we want an app to be able to tell WebXR what they support, and let the UA present the button itself? So, web pages could have the "Enter XR" button, OR they could signal "XR capable" and the UA could present the option? Obviously, we could also support both. |
I'd be pretty strongly against making the options passed to That said, if there was a super strong need for something additional in
This is definitely more inline with what I was thinking. And here I'm not sure if the returned features should be part of the eventual session or not. Like so? interface XRSessionFeatures {
boolean worldAligned;
XRDisplayType displayType; // Opaque, passthrough, transparent?
// Etc.
}
partial interface XRSession {
XRSessionFeatures features;
}
partial interface XRDevice {
Promise<XRSessionFeatures> supportsSession(XRSessionOptions sessionOptions);
Promise<XRSession> requestSession(XRSessionOptions sessionOptions);
} I can see that being a nice pattern, but I can also see it being more restricting than we'd prefer. |
I think I tend to agree. Just wanted to make this clear.
Do you have an example of a returned feature that you wouldn't see being part of the session object? I would think we'd want to it be, for the same developer confusion reason, especially if they might not get a session option that is possible. Consider a display that can provide video/sensor data, or geospatial data. The features returned from |
Arg, meant to address this too! To start, we've been talking about how to expose these things a lot at Google since we're going to be dealing with Daydream vs. ARCore. Do we expose them as two different devices or a single device where the backend that gets spun up is a function of the session options passed? It feels like the latter path is the better option for us, treating the physical phone as the singular That aside, I feel like a pattern for how the page advertises it's capabilities is a function of the content and not something that we can or should do much to dictate. I imagine most content will broadly fit into buckets of "Preferred method with fallbacks" and "Multiplie specializations." For "preferred method with fallbacks" lets use the example of an interior design app. They probably prefer (in a world where such devices are ubiquitous) an AR HMD, letting you design the actual space you're in in an immersive way. But hey! If that's not available, no worries! Handheld AR is pretty good at this scenario too. But if that's not available then working in a VR blank slate room that approximates the real one's dimensions isn't bad. And if all that fails then a simple 2D app is probably fine. This whole spectrum requires one button, though they may want to change the label depending on the mode you'll launch. The alternative is an app like A-Painter, where there's a clear set of requirements for the painting mode to be feasible, and if that's not available then showing a gallery mode is a good fallback. BUT! What if I have a fancy 6DoF setup but I want to view other people's creations anyway? In that case you can easily envision the page having "View Gallery" and "Create your own" buttons that are both available, but maybe disabled if The only catch here is if you wanted a UA button that existed outside the page, and I kind of feel like if that's going to be a thing then the developer needs to pick a single set of session options and say "That's the default." This would be important for cases where the UA is still initiating the session creation but there's no opportunity for UI to be shown. (Page-to-page navigation, inserting a phone into a headset, proximity sensor triggered, etc.)
Not right off. Also, I think your point about permissions is a good one. Should we advertise a feature as being available even if the user has to grant access to it first and hasn't done so? |
Given that most features are not guaranteed to be available at any moment: world alignment may not be possible if GPS is not available due to bad weather of no clear view of the skies (indoors) or if the digital compass doesn’t work due to magnetic interference, and 6DOF may fail if there is not enough light / not enough visible features for tracking / user moves too quickly, etc.), I agree that the semantics should be that certain features are advertised as being supported on a given session under ideal conditions, but not guaranteed to be available for any number of reasons, including permissions not being granted. This also implies that apps should be reactive to these features coming and going throughout the lifetime of a session. Some of this is already taken into consideration in the current spec (e.g., 6DOF -> 3DOF due to loss of tracking), but it seems unlikely that we’ll be able to completely avoid the scenario where an app discovers (after starting the session) that one or more required features are not available. |
I think I tend to agree. Just wanted to make this clear.
Do you have an example of a returned feature that you wouldn't see being part of the session object? I would think we'd want to it be, for the same developer confusion reason, especially if they might not get a session option that is possible. Consider a display that can provide video/sensor data, or geospatial data. The features returned from @speigg I was specifically thinking about these as high level feature ID's. So, "the session is capable of doing this and giving you this data / feature." But, that may or may not mean the feature works perfectly / smoothly through it's life. I agree we need to have the ability to turn some things on/off (e.g., if we expose "world knowledge" like meshes, etc., the UA should/could provide a button to turn it on / off over the life of the app). I suspect we need to deal with features that "change" over the life of the page on a per-feature basis, or with events (e.g., in the "world knowledge" case, the app would likely get notified when it stops getting this info and starts again). Also, the geospatial is not a great example of this kind of coming-and-going (i.e., since if there is geolocation and you have access, you will get SOMETHING even if low accuracy, AND that's probably good enough for orientation alignment). |
Doesn't returning a set of features from Separately, I think we will want to specifically look at how deferred session requests (#256) would work. The reasonable options for such requests may be much more limited than what |
@ddorwin Thinking about this, and looking at #256, I guess one question is if we're willing to adopt a more asynchronous style of session creation like we did in Argon, inspired by how various desktop systems do window creation. Specifically,
I would much prefer to see something like this happen, I just didn’t think folks would go for it. It makes things much simpler for most pages, in my opinion, (they simply express preferences and don't need to provide a UA for requesting if they don't want) and gives the user much more complete control over things. If a page requires some specific capability (only AR or VR, computer vision, geospatial, video mixed or pure see-through, ...), they adopt a style where they check the capabilities of any session they are given, and pop up a warning / explanation in the session they are given: this handles all the various cases in this way (user started on bad device, user was already on bad devices and navigated to page, etc). If a page can't (or doesn't want to) deal with dynamic device or session changes, they can use the exact same approach / dialogs / warnings when they get a new session while already running ("Sorry, you need to reload ...") |
@blairmacintyre funny, I was going to suggest something very similar, but I didn’t want to rock the boat too much :) Here is what I was thinking:
This might look like this in practice: function checkForXR() {
navigator.xr.requestDevice().then(device => {
onXRDevice(device);
}).catch( err => { ... } )
}
navigator.xr.addEventListener(“devicechange”, checkForXR)
function onXRDevice(device) {
device.addEventListener(“session”, evt => onXRSession(evt.session) );
advertiseXRSupport(device)
}
function onXRSession(session) {
if (session.type === “transparent”) {
// setup app for AR
} else if (session.type === “opaque”{
// setup app for VR
}
}
function advertiseXRSupport(device) {
let arSessionOptions = {type: ”transparent”, exclusive: true, outputContext: myOutputContext}
device.supportsSession(arSessionOptions).then(features => {
If (features.featureMyAppNeedsForAR) {
arButton.style.display = “block”
arButton.addEventListener(“click”, () => {
device.requestSession(arSessionOptions)
})
}
})
let vrSessionOptions = {type: ”opaque”, exclusive: true, outputContext: myOutputContext}
device.supportsSession(vrSessionOptions).then(features => {
If (features.featureMyAppNeedsForVR) {
vrVutton.style.display = “block”
vrButton.addEventListener(“click”, () => {
device.requestSession(vrSessionOptions)
})
}
})
} A few isssues:
|
I like where @blairmacintyre and @speigg are heading with moving away from promises for session creation, I am definitely in favor of a single code path for developers to use for navigation/onload, in-page click and other initiation request sources (donning headset, button in the browser frame itself, etc...). In particular for the 'in browser frame button' use case, the thing that makes me worried about relying on a promise based requestSession call in the page load event, is that the developer will need to re-requestPresent whenever presentation ends; otherwise the button will only work once! I've been toying around with lots of variations of how this flow could work to try and address some of the issues that @speigg identified - in particular the duplication of logic and fork based on session type. One approach I'd like to put forward is the registration of sessionRequest listeners when a new device is found rather than using events. The listener method takes two parameters: XRSessionFeatures (returned from The idea is that the page would register what kind of sessions it is interested in via
the UA will respond by calling an appropriate session listener callback (based on the options given). If no sessionRequestListeners match the required options, no callbacks are fired (perhaps we need an event here?) I also make the assumption that the outputContext is NOT provided in the A modified version of your example... function checkForXR() {
navigator.xr.requestDevice().then(device => {
onXRDevice(device);
}).catch( err => { ... } )
}
navigator.xr.addEventListener("devicechange", checkForXR)
function onXRDevice(device) {
let vrSessionOptions = {type: "opaque", exclusive: true}
let arSessionOptions = {type: "transparent", exclusive: true}
// Query for session types that this app supports.
device.supportsSession(arSessionOptions).then(features => {
if (features.featureMyAppNeedsForAR) {
device.addSessionRequestListener(features, session => {
// App specific function
configureScene(session, {/* AR app specific configuration */})
});
arButton.style.display = "block"
arButton.addEventListener("click", () => {
device.requestSession(arSessionOptions)
})
}
);
device.supportsSession(vrSessionOptions).then(features => {
if (features.featureMyAppNeedsForVR) {
device.addSessionRequestListener(features, session => {
// App specific function
configureScene(session, {/* VR app specific configuration */})
});
vrButton.style.display = "block"
vrButton.addEventListener("click", () => {
device.requestSession(vrSessionOptions)
})
}
);
}
function configureScene(session, params) {
// Optionally set the output context, AFTER session is created.
session.setOutputContext(outputContext);
// Set other app specific settings
// params...
// requestFrameOfReference, create layer, requestAnimationFrame etc.
} I'm not entirely happy with how the above sample fits together just yet - but I think it shows the direction I am trying to portray. |
In both Chrome and Oculus Browser ( probably in Firefox too, but I haven't tested ) WebVR 1.1+ there are edge cases where a page can be loaded with a display already presenting. I don't have strong opinions about the implementation, but I like @leweaver's direction to support a way to listen when sessions are initiated. In the case I'm referring to above these are for featuring specific pieces of web content through the Daydream Home Screen and Oculus Home Application Thumbnails. |
Apologies for the epic comment, but there's a lot to cover here. The following is based on a variety of conversations with multiple people over several weeks, but a huge portion of the credit goes to Nell for flying to Mountain View to spend a day discussing this, and Alex for staying up with Nell and I into the wee hours of the morning at SIGGRAPH to refine the concepts further. That being said, I'm not attempting to represent the below text as their opinions. It's really just my understanding of the conclusions we converged on. Primary Goals
Proposal to get thereIt should be reasonable for browsers to condense any permission-invoking API calls made in the course of a single callback into a single dialog. (Whether or not the browser chooses to do so is for the UA to decide.) Specifically, it would mean that something like this could potentially produce a single dialog with multiple checkboxes api.requestSensitiveServiceA().then(/*...*/);
api.requestSensitiveServiceB().then(/*...*/); While something like the following would out of necessity produce sequential permission dialogs api.requestSensitiveServiceA().then((svc) => {
api.requestSensitiveServiceB().then(/*...*/);
}); Thus, if everything that potentially requires permissions can be called without blocking to wait for a previous potentially permission-dialog-producing (henceforth "PDP") call we have the opportunity to allow developers and UAs to intelligently control how and when they want to incur permission dialogs. Given the WebXR APIs current design, the primary hurdle to this appears to be that we generally want to hang PDP calls off of the xrDevice.requestSession({ immersive: true }) // May ask for general XR permission.
.then((session) => {
session.requestEnvironmentMesh() // May ask for permission, but can't until session request resolves.
}); One potential solution to this is to ensure that session creation is very lightweight, requiring minimal options and no permissions to create. Then any PDP calls are handled after the fact. This would include things like AR passthrough, which in reality can be viewed as just another data stream on top of the core tracking tech. Given that we all appear to agree that inline sessions without an AR passthrough should be allowed without permissions or user activation this seems like a tractable idea. (An "inline" sessions is my current term for when the primary output is the in-page element. I'm trying to get away from the term "magic window"). As an example of how I see this working out (making up feature APIs as I go): let xrSession = await xrDevice.requestSession();
let xrEnvMesher;
let xrEnvLight;
// Required features
Promise.all([
xrSession.requestARPassthrough(),
xrSession.requestEnvironmentMeshing(),
]).then((values) => {
xrEnvMesher = values[1];
startFrameLoop();
}).catch(() => {
// Whoops! Something we needed isn't there.
xrSession.end();
});
// Non-required feature
xrSession.requestEnvironmentLighting().then((envLight) => {
xrEnvLight = envLight;
}); // No catch, don't care. I should note that the one "feature" I don't feel fitting cleanly into this architecture is whether or not a session is immersive or inline. This distinction seems "special" since it determines not only where the content is displayed but also may determine what sets of features are accessible to the session itself. (For example, a Pixel phone could use AR passthrough on an inline session via AR core, but not an immersive one because Daydream doesn't support it and the phone's cameras are obscured anyway. Thus it's helpful to distinguish between modes.) It's tempting to allow the session's immersive state to be mutable, set after the fact with a call similar to For example, does this work... Promise.all([
xrSession.setImmersive(true),
xrSession.requestSomeImmersiveOnlyFeature(),
]); ...while this fails? Promise.all([
xrSession.requestSomeImmersiveOnlyFeature(),
xrSession.setImmersive(true),
]); That feels wrong. I'd prefer if possible for features that are requested on a session to be persistent for the duration of the session and have almost no dependence on other features unless said dependence is explicitly baked into the API surface. (That is, something like So this suggests to me that we probably still want to keep the same model as we have now (or at least one that's not drastically different) where the (Note: Nell has already indicated to me that she doesn't feel as strongly as I do about Of course, that may work for some browsers, but others may want to do like Edge does currently and display a permission prompt for accessing immersive hardware features at all. This is an understandable stance and we should provide a reasonable mechanism for it. In the case that a browser wants to treat any immersive hardware access as a PDP feature AND wants to only show a single permission prompt when it can, I would say that the immersive hardware access permission should be triggered not on the initial session request but on either the first PDP feature request from that origin OR the first call to This pattern should keep dialogs aligned with contextually sensible user actions, prevent the browser from being required to show stacks of permissions sequentially, still allow for just-in-time permissions for features that aren't needed right away, doesn't require two different variants of the feature APIs, and lets browsers be as light touch or as aggressive on permissions as desired and still give developers a way to produce predictable behavior across the board. Keep in mind that the permission is for the entire origin, and not a "I want this specific call to go through". Also, we don't need to have a "one feature, on permission" model. Requesting one feature in code may trigger a permission dialog that subsequently covers multiple other features. Feature interactionIn discussing the above, I ended up fielding some questions about what the theoretical feature requests above would return. I feel that's worth stubbing out just to make the scenario a bit more realistic. In some cases, it seems like the API request wouldn't have to return anything, and simply resolving or rejecting would suffice. For example, requesting that AR passthrough be enabled wouldn't have much to return because it simply activates a compositing feature. xrSession.requestARPassthrough(); (Quick side note on that, BTW: Even though this isn't a serious API proposal I think something along these lines could be workable even on devices like HoloLens/Magic Leap, where it could functionally be a no-op that resolves immediately.) In other cases, a feature API request could easily just return the desired value. A good example of this might be asking for camera RGB data: arMediaStream = await xrSession.requestARCameraStream(); In this case there's a single, clear, desired value that is likely to be used immediately, so returning it immediately upon the UA permission policy being satisfied is sensible. You could also theoretically use this JUST to ensure that the correct permissions were acquired by calling the function and ignoring the returned value, which would cause it to garbage collect almost immediately. It's worth considering that doing so may be a semi-heavy operation in some browsers, however. Finally, it feels like there's multiple APIs where the request should actually return an object that is used to then control the behavior of the feature requested. For example, with environmental meshing: xrEnvMesher = await xrSession.requestEnvironmentMeshing();
// Some time later...
xrEnvMesher.addEventListener('meshchanged', onMeshChanged);
xrEnvMesher.start();
// Even later...
xrEnvMesher.stop(); In this case the feature is known to be heavyweight and requires some more fine-tuned control and interaction. Thus an object is returned that has all the methods needed and which can be used to actually activate the heavy lifting as needed, while any permissions necessary are taken care of at request time. Which patterns we use for which features is definitely something that should be evaluated on a case-by-case basis. ConclusionI'm not convinced that anything described above is the perfect solution, there's a few unaddressed issues that are adjacent to this one (testing for support for the purpose of showing buttons, for example), and I'll admit the "wait till the first rAF to prompt" pattern feels a bit janky for those browsers that would need it. But I feel like the discussions around this have been very helpful in allowing me to really grok some of the usage patterns and challenges around this particular API. It's difficult to capture all of it without this becoming a novel, but I'm happy to field questions in the meantime! |
@toji Your proposal looks very promising! No pun intended :)
Small suggestion: how about something like
I understand why making 'inline' vs 'immersive' mutable throughout the session may be problematic if the goal is to have a consistent set of features throughout a session's lifetime, however I think there is a lot to gain in embracing the dynamic availability or non-availability of features—applications should already be structuring their rendering code around the given set of XRViews, so if these XRViews were to change dynamically based on 'inline' vs 'immersive' mode, an app should be able to instantly adapt accordingly. Likewise, if features can come and go (based on changing permissions or other reasons), applications should be able to react. IMO, the only reason that an application should end an XRSession and give up, is if the one or two features that it absolutely requires are not supported at all on that platform—not if they simply aren't available right now at the moment they are requested. In other words, I don't think applications should be relying on requests to change session state in order to determine whether or not they should end their session and give up. Rather, applications can ask the session if certain features are supported (while at the same time asking for permission to use such features)—and then fail only if those features are not supported at all (not just if the user simply denies permission to those features). One reason for this is to allow the UA/user to change the permissions dynamically without disrupting the session. If the UA/user disables a feature that is actually supported and which the application considers to be necessary, then the application should prompt the user to enable that feature. More so, I think we may want to explictly distinguish between APIs that request a change in session state, vs simply asking for access to certain features. For example, we may want to adopt a pattern such as "use*()" when only requesting access to certain features (and ensuring their support on the current platform): let xrSession = await xrDevice.requestSession();
// Required features
Promise.all([
xrSession.useEnvironmentBlending(),
xrSession.useEnvironmentMeshing()
]).then((values) => {
// If we succeed, then these features are supported,
// and permission has been requested (not granted)
startFrameLoop();
}).catch(() => {
// If we fail, this means the features requested are not supported at all
// and since this session will never support what we need, we might as well end it
xrSession.end();
});
onFrame(xrFrame) {
// If we are here, it means we have a session that *potentially* supports what we need
// ... but perhaps right now it does not
if (xrFrame.environmentBlendMode === 'opaque') {
// let the user know we don't have what we need, and ask them to enable that feature
showPromptToEnableEnvironmentBlending()
}
if (!xrFrame.environmentMesh) {
showPromptToEnableEnvironmentMeshing()
}
if (!xrFrame.immersive) {
// we may want to render inline (optional)
renderInline()
} else {
renderImmersive()
}
}
onEnableEnvironmentBlending() {
xrSession.requestEnvironmentBlending(true)
// the UA might tell the user they need to take their phone
// out of the enclosure to enable environment blending,
// or change the session mode, or whatever
}
onEnableEnvironmentMeshing() {
xrSession.requestEnvironmentMeshing(true)
// Again, UA might ask the user to confirm,
// and may change the session mode if necessary
} With this kind of API, the UA/user is free to enable / disable any features as desired, and to change between 'inline' and 'immersive' modes as desired. If an application requires a certain feature (e.g., environment blending or camera stream), it would prompt the user and attempt to request that feature only if the user indicates that they want to re-enable that feature.
The order-dependency problem here is also alleviated with the approach I have outlined, without requiring 'inline' and 'immersive' to be separate session types. For example: Promise.all([
xrSession.useImmersiveRendering(),
xrSession.useSomeImmersiveOnlyFeature()
]).then(()=>{
// If we succeed, then these features are supported,
// and permission has been requested (not granted)
})
onFrame(xrFrame) {
render()
if (!xrFrame.immersiveOnlyFeature) {
showPromptToEnableImmersiveOnlyFeature()
}
if (!xrFrame.immersive) {
// we may want to render inline (optional)
renderInline()
} else {
renderImmersive()
}
}
onEnableImmersiveOnlyFeature() {
xrSession.requestImmersiveOnlyFeature()
// UA may ask the user for confirmation here
// Since this is an immersive-only feature, the UA should also inform the user
// that this feature would require switching to an immersive mode
} This way, the application doesn't even have to know that a certain feature is "immersive-only". Simply by requesting a certain feature, the UA can enable/disable other features as necessary, while the application simply reacts to whatever is available. |
(your epic comment is going to be very hard to respond to, @toji !, so I'm going to just go for it and make a bunch of replies, so big, some small, I suspect) One thing you say I want to call out:
I think you need to define this more. IF the inline session gets any motion sensor data, this is not true. The reason the devicemotion API was never ratified, and that it's being deprecated, is that it can be accessed without permission and serves as a threat. So, if a webxr session gives anything like the device orientation or motion, it will require a permission. If it is more about getting a control flow for rendering that matches an eventual rendering loop, but doesn't give any device sensor data, than this is probably true. But we should define that, if so. |
Another. I don't know you mean by "AR passthrough being another data stream" in this context, like here:
A WebXR session needs to know if the device is AR (there is a view of the world, either via composited video or transparency) or VR (there is nothing seen by the user except what the session renders), in order to decide on what to render (e.g., skybox?). A WebXR session may want to know if it's video passthrough or optical see-through AR, so it can decide on some rendering approaches (since those displays show things differently). "Requesting AR passthrough" does not give more data, it's just a feature request (as you then highlight further down). |
Another. Decisions being made regarding your implementation are bleeding into examples, and make it confusing and hard to discuss. For example
This first half of this seems true, but the example is true if-and-only-if you assume "immersive" == "hmd". (This seems to be implied in your comments, but I don't think this has been agreed on. It may be that it is also assume by others, or not). The problem with this interpretation is that it results in the common path for developers to be that they build "HMD only" content. The current samples in the samples repo, for example, have samples that only run on HMDs (i.e., they request "immersive") for no good reason, aside from that it's easier to create samples that impose this restriction than it is to create samples that are flexible. An alternative interpretation here:
|
Regarding whether a session's immersive state is mutable: I think it should be, especially if we think about situations like the "Article" demo Google created. It would be nice to create a session, display it inline, and then be able to toggle to/from immersive mode (akin to toggling video display fullscreen and back while the video is playing). Without creating/destroying sessions (which would cause additional permission prompts) Regarding your concern:
An alternative view is that "permissions" are "permissions", but don't guarantee data! For example, when I added the ability to switch cameras (world to user facing) to the WebXR Viewer, I ran into the fact that ARKit's world motion tracking only works with the forward facing camera. So when the user switched to the user facing camera, those ARKit anchors break. And, similarly, face tracking only works (on ARKit) with the iPhoneX on the user facing camera. If I give permission for "world meshes", and the user switches to the user facing camera, that permission doesn't become invalid, even though the mesh is no longer delivered and the anchors all are destroyed. And if we had a feature request for "face tracking" in your list above, it wouldn't make sense for it to fail up front if the initial state wasn't "user facing camera on iPhoneX" ... it might fail on a different phone, but on the iPhoneX, it might succeed (yes, you can track faces) but wouldn't give any data to the app unless the user-facing camera was in use (assuming an app that wants to let the user toggle cameras, instead of forcing one). So, in your examples, the user would be asked for permission if some configuration of the display (immersive or not, front or rear camera) supported it, but the feature wouldn't "run" (or deliver data) unless it was in the right situation. This might require each feature to have an "active" flag, or provide some way of communicating to the programmer what is required for it to work, so the programmer could make choices or otherwise inform the user ("To put dog ears on your face, you must switch to the selfie camera.") |
When this gets written up, can you find another example besides this one:
My current understanding is that the WebXR API will not provide access to the camera stream. I say this because I explicitly requested this be considered at the face-to-face meeting, and received considerable pushback from some folks there (i.e., with the suggestion to look at leveraging other web APIs). I personally am open to the idea of providing camera data, but it is a major undertaking to do it right, and there has been zero discussion of it for quite some time. So, I think we should either engage more fully with it, or stop using it as an example. |
@speigg wrote:
Actually, please no! Programmers should request AR, but discover what form of AR the display supports. Otherwise, programmers are encouraged to write code that only works on one sort of displays. I'm not jazzed by the name |
@speigg wrote:
I would agree we need the ability for the UA to enable/disable features after they give permission, but I am not sure breaking the permissions apart is a good idea, nor do I think telling the application when those permissions are enable/disabled (explicitly) is a good idea. First, I would wonder about the fingerprinting implications of telling the app the full capabilities of the display, independent of permissions. Should the app know that "meshing is possible" but "the user denied access" vs just knowing it can't use meshing? I would favor the app not being able to distinguish between the two cases, both because it encourages app writers to deal with it, and because it dissuades them from coercion ("I know your device supports it, so even though I have the ability to offer you a downgraded experience, I'm not going to, I'm going to require the permission."). We played with this in Argon4, where the location permission was essentially a toggle; if the app asked for geolocation, the user was presented with a permission prompt, but from then on out, they could toggle location on/off. If the user toggled it off, the app would just stop receiving position updates. In the case of meshing, for example, the flow I would want is:
There are some tricky questions for any feature like this, though, that might dictate if it can be toggled. Specifically, can we provide the right guarantees? It's easy to see that the UA can stop sending mesh updates, but internally, the world understanding WILL be updating, and the mesh representations refined. If we start sending meshes again, will we be able to filter out the parts of the mesh that were "learned" when it was off. Consider plans in ARKit/ARCore. When I walk through my house, the floor plane is extended, and separate smaller planes are often merged. If I turn off meshing when I go into one room, and ARKit decides to extend the floor plane into that room (along with it's geometry outline), it will be VERY hard (impossible?) for the browser to remove the additional knowledge from the ARKit data if I turn meshing back on when I leave the "sensitive room", thus leaking data. In contrast, image or face detection and tracking, where the data being sent does not have long term history implications, might be good candidates for toggling access. The application would not be able to distinguish between "there are no faces" and "the user disabled detection", which is exactly what we want. |
@blairmacintyre wrote:
Yes, that’s a nicer way of saying what I was trying to explain above. The key is, as you stated, that the feature request does not fail just because permissions aren’t granted. Likewise, the |
Final comment, @toji. I like the direction of this. Modulo my continual pushback on your implied (or explicit) definition of "immersive" 😄, I think this direction is a good one. I really like that it opens the door to us encouraging developers to write flexible apps. I would like us to consider making demos and samples that are a bit more flexible, though. For example, I would hope we can dramatically limit our use of the "required features" pattern, and instead set up these samples to react to what is available. For example, instead of this: let xrSession = await xrDevice.requestSession();
let xrEnvMesher;
let xrEnvLight;
// Required features
Promise.all([
xrSession.requestARPassthrough(),
xrSession.requestEnvironmentMeshing(),
]).then((values) => {
xrEnvMesher = values[1];
startFrameLoop();
}).catch(() => {
// Whoops! Something we needed isn't there.
xrSession.end();
});
// Non-required feature
xrSession.requestEnvironmentLighting().then((envLight) => {
xrEnvLight = envLight;
}); // No catch, don't care. I would prefer us to have our samples to do something like let xrSession = await xrDevice.requestSession();
let xrEnvMesher;
let xrEnvLight;
// strongly preferred feature flags
let xrSky = true;
let xrMeshControl = null;
xrSession.requestARPassthrough().then((blendmode) => {
xrSky = false; // don't draw skybox
}); // No catch, flag set.
xrSession.requestEnvironmentMeshing().then((xrMeshControl) => {
xrMeshing = xrMeshControl; // we have a mesh object, will use it
}); // No catch, flag set.
// Less important, non-required feature
xrSession.requestEnvironmentLighting().then((envLight) => {
xrEnvLight = envLight;
}); // No catch, don't care. |
@blairmacintyre wrote:
Did you mean “should not request AR”? Assuming so, I think it depends on what the semantics of this “request” are. On a XR device that supports both environment passthrough/blending (AR) and VR, I think it’s fine if the app (1) asks if this feature is available and asks for any necessary permissions (if any), and (2) under the right circumstances (in response to user input), asks the UA to toggle the environment passthrough (if possible). But certainly, in any case, I agree that the app should react to the current environment passthrough / blend mode. |
@speigg I think I misunderstood you:
I was interpretting the two modes as variations of AR (optical see through vs camera overlay). Do you mean them as "AR" vs "VR"? |
So, the only problem with this flow is that the application does not benefit if the user grants permission after originally denying it. This is the flow I’m suggesting:
|
@blairmacintyre wrote:
Not sure what you mean. I was just suggesting a different name for the |
@blairmacintyre in your example code: xrSession.requestARPassthrough().then((blendmode) => {
xrSky = false; // don't draw skybox
}); // No catch, flag set. This probably isn’t the right place to be “reacting” to session state such as the blend mode. The blend mode rather should be checked every frame ( onFrame(frame) {
if (frame.environmentBlendMode === “opaque”) {
drawSkyBox = true
}
...
} As a general rule, all session state that an app cares about should be checked for changes on every frame. |
Yes, that was intentional. If the user denies, it's done for that session. I went this way because I don't think this step in your flow is a good idea
Specifically, this allows the app to detect all possible features of the hardware, independent of whether the user allows it. Based on the "fingerprinting by iterating through device and capability lists" I don't think we will be able to allow this, practically. I could be wrong, but that's my assumption anyway. |
In general, I would agree with you, except I have had no luck in pushing for a more reactive architecture here. For the value of The WebXR API session flow and so forth does not support this. There are no provisions for notification when something changed, and all of the sample flows being proposed are set up assuming all changes to the session are trigger by the web app itself. Since the web app is trigger all changes, it would be able to update flags like my This is unfortunate, but because of it, there is no need to complicate the render loop with checking for such things. |
@blairmacintyre You’re right, I thought the |
This issue feels like it's gotten pretty unwieldy, and the conversation and current thinking has moved back and forth enough times that it's hard to follow while reading this. I'm inclined to close it down in favor of some more granular PRs/issues. #419 was just merged, which reconfigures session creation a bit, and I just added #423 and #424 to cover some of the other topics. Let's continue the conversation there. |
XR session creation is structured as it was in WebVR, which had a much more limited set of use cases and possible display structures.
I would suggest we need to do a few things to make session creation more flexible and usable. Here are a list of possibilities, to spur discussion.
First, the combination of
supportsSession
andrequestSession
seem reasonable, but we need a way to handle the case where an external action may activate a session. Is there an implication that the UA can fire off an XRSession event when there is not currently a session? For example, a UA may have it's own UI for activating VR or AR, which might include allowing the user to specify the sort of session. Or, if a user follows a link while in AR of VR, the next page should auto-create a session corresponding to the previous page's session.Second, we should update the XRSessionCreationOptions to reflect the diversity of sessions we might get, and have a corresponding set of attributes on the XRSession.
One of the assumptions I'm making here is that a given implementation would support one or more of these combinations, as they see fit and are able. A 6DOF device could support 3DOF if it wanted; it's not required to. A device that doesn't have a magnetometer may not support "worldAligned".
A phone like a DayDream and ARCore capable Android phone could end up supporting a range of realities that provide some combinations of
By exposing these as options, and properties, we also allow developers to inspect the current session to see what the setup is, and create an appropriate UI.
The text was updated successfully, but these errors were encountered: