-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hit test results are fundamentally rays, not poses #86
Comments
I think it's still better to present the result as a pose, even if the Z and X axes are unconstrained. My goal here was to make it familiar & consistent with I think we can explicitly specify that X and Z axes are undefined on the returned pose. The WPTs will need to be relaxed in this case, but that should be ok. |
I think the issue is really interesting, and I agree with both of you. 😄 I'd want Perhaps simply doing what a physics collision system would do, and adding |
Hmm, emulatedPosition is a good point. I think XRRays are also relative to a space given the way we supply them. I think that while symmetry is valuable, it's not worth it to make the API overall worse to get that symmetry, since now the normal vector is not directly exposed. But this is kind of academic anyway since we can't make this breaking change. Could you update the relevant tests? We basically need a |
I like this, this is similar to the Unity API. What if we created an The tests can be updated to check these and we can have additional checks that ensure that the transform correctly matches the point/normal. |
I think that the result of a hit test is semantically closer to XRPose (position and orientation relative to some space) than to XRRay (infinite half-line expressed in some space). There is a physical similarity between XRPose with 2 unconstrained axes and XRRay, but to me it seems like not a valid enough reason to use XRRay to represent the results of hit test. |
I am not super-happy with My preference would be to add update I think many people will want the pose, because a common use is to just display some content at the location of the hit, and letting the API choose the undefined two axes is just as good (and simpler, even) than doing it yourself. But, there will be other uses that really want the point and vector, so giving it directly, seems right. (pardon my crap IDL, I'm sure what I put below isn't exactly right, but you see what I'm saying -- perhaps I should be passing in a space, for example).
|
Yeah to be clear I'm past suggesting XRRay, I prefer Blair's idea, though I think it should be on a subclass of
|
I was suggesting the other way to avoid creating a pose (with "junk" in it) that we don't need. I admit I was also avoid having to waste "work" computing (and allocating) poses that you don't want or need. By the time you create a pose, the normal is just the Y vector and the position is well defined too. This concern is probably not a big deal (since there won't be piles of hit tests per frame) |
Attributes can be lazy evaluated. Implementation wise this would mean that the The thing you proposed can work if @bialpio @toji which do you prefer:
vs
|
Implement hit testing API Depends on servo/webxr#149 , #26170 This implements non-transient hit tests. The tests that do not pass are due to web-platform-tests/wpt#22898 , web-platform-tests/wpt#22900, web-platform-tests/wpt#22901 , and immersive-web/hit-test#86
Implement hit testing API Depends on servo/webxr#149 , #26170 This implements non-transient hit tests. The tests that do not pass are due to web-platform-tests/wpt#22898 , web-platform-tests/wpt#22900, web-platform-tests/wpt#22901 , and immersive-web/hit-test#86
Implement hit testing API Depends on servo/webxr#149 , #26170 This implements non-transient hit tests. The tests that do not pass are due to web-platform-tests/wpt#22898 , web-platform-tests/wpt#22900, web-platform-tests/wpt#22901 , and immersive-web/hit-test#86
I actually prefer another option, which is to refrain from adding helpers until we get strong signals that they are needed. I'm worried that this is a slippery slope and we'll turn this (and the main spec) into a bunch of mathematical utilities. Especially given that it seems to me that both of those methods could be written as one-liners in JS, something along the lines of: function getPosition(pose) {
return pose.transform.position;
// alternatively:
// return DOMMatrix.fromFloat32Array(pose.transform.matrix).transformPoint(new DOMPoint(0, 0, 0, 1));
}
function getNormal(pose) {
return DOMMatrix.fromFloat32Array(pose.transform.matrix).transformPoint(new DOMPoint(0, 1, 0, 0));
} |
I don't consider these helpers: I consider these to be the primary API we should have had in the first place, with |
The API hasn't been approved. We should focus on making it correct, since we aren't breaking anything approved. In that light: the "correct" API does not return a matrix, since the underlying conceptual model does not return a pose. I would support having a utility method that returns a pose, because that is a common use case (as a programmer, having the take the point+normal and generate a pose is "a bit of a PITA"), but framework and app programmers should really be thinking about the fact that this matrix is partially arbitrary, and they could instead pick the extra axes based on their application specific needs. |
/agenda to figure out the best path forward here |
Exposing a matrix in the hit test results, essentially, is requesting additional information from the platform -- orientation around the surface normal vector. Some reasonable algorithms could be explored to make this useful, such as ensuring that an object is rendered facing the device or in a way that is accessible given the user's preferences. The problem with this; however, is that it assumes that hit testing is used primarily for placement of objects. Many consider a hit test as a low-level concept, which can be used as a primitive composing a variety of higher-level algorithms, not just placement of an object on a surface. For example, hit tests can provide world understanding for AI actors that navigate autonomously with path-finding algorithms. They can also be used to help select appropriate impulse response samples for processing of audio reflections / reverb or for rendering large-scale-ambient-occlusion effects. Perhaps we could say that if the intent is to have the API provide a ready-to-use matrix, positioning an object in a useful way in space, that such higher-level logic be activated by requesting an Unlike hit tests, anchors have the added responsibility of providing consistent orientation over multiple frames, and thus expose more information. |
This was discussed in the call today. A major new point brought up was that hit testing is often used for orienting objects placed by the user. I'm not going to fully articulate that position for fear of accidentally misrepresenting it (but would appreciate if someone who is in that camp does so). Currently, because a pose has additional degrees of freedom, implementations might differ on how these degrees of freedom are picked. Indeed, ARCore and thus the Chromium implementation currently constrains this with "X+ is perpendicular to the cast ray and parallel to the physical surface centered around the hit test", which is not always unique and thus it's not possible to replicate the exact behavior across browsers. The status quo of it being underspecified is not good. If content is blindly using it, it will break when used on a browser other than the one it was tested on. If content is converting it to a ray before recreating the transform it needs, there was little point to exposing the transform in the first place! I feel like we are generally in agreement that this should at least be specified better. Correct me if I'm wrong. We should definitely push forward on picking a specified Now, my unease with actually specifying it and having it be the primary entry point for the API is that I don't think there's a logical way to specify it that is actually generally useful. One logical way to specify it would be to align the X or Z axis with one of the reference space axes (basically giving you a notion of "lie flat"), except there's no indication as to which reference space to use -- the hit test was likely created with an input space and won't know whether to pick local, local-floor, unbounded, or bounded-floor. Except as @thetuvix brought up during the call, this makes sense for placing objects on the floor, you probably want something different for placing objects on a wall! And if we pick something different, what do we do with planes that are at an angle? Plus, X needs to be perpendicular to Y. You can take your desired rough z-direction vector and cross product twice ( So I'm skeptical that we can pick a decent choice of transform here that does what the user will expect for all kinds of planes, and even if we do, it will be very jumpy around a particular kind of configuration. Personally I think most applications will want either "roughly aligned to world-space, but flat on surface", or "roughly aligned to input ray, but flat on surface" for "placing on floor", and a similar option for walls except instead of being "flat on" (y parallel to normal), it's "flat against" (z parallel to normal). This is a four-way ambiguity, not counting the fact that we don't know which world space the application wishes to use here (it can also be "facing the user"!). My impression is that the ideal system here for placing objects seems to be some kind of smooth transition based on normal angle that's essentially about "tip the object in the direction that causes one of its coordinate planes to stick to the hit tested plane with the least angular displacement", but this doesn't fix the two-way ambiguity involved in picking where the object's coordinate planes are locked to, and that's application choice. Given the above, to me it seems likely for applications to not like the choices we're making, and this makes me uncomfortable with
|
Correct, the API (as per explainer) focuses mainly on object placement use case. Other use cases could potentially be solved by it (maybe a bit awkwardly given the subscription-based nature of the API, as with the agent navigating the environment), but I mainly looked at it through the lens of placing objects. Tying it into the anchors API is also a good side effect of the current approach, but I did not consider it when writing the spec.
Agreed, this should be fully specified to enable cross-browser compat and it needs to happen irrespective of what we decide for additional properties on the hit test result, so let's start with that.
It may not be possible in a general case (as mentioned during the call), but I think it'll be valuable to use something that works fairly well in the common case, which to me is placing an object on the floor. I feel like we had a similar conversation when introducing a matrix property on the ray, and it's actually a good thing we have done that since now we can leverage it to specify the algorithm here as well, roughly:
I believe this will work (assuming I have not messed up the math, please proofread), is resilient against the edge case of normal being equal to initial Z axis, and will make the pose's Z axis point roughly towards the origin of the ray (matching Chrome's behavior) w/o relying on any data not already available to hit test. |
I think this will work conceptually as a choice of transform given that backwards compatibility forces us to make one. It still has a patched singularity around the "normal = Z" case, which means that the placed object ghost will rotate erratically around that point -- look at what happens when a normal that is slightly offset from Z swings across normal=Z to be slightly offset from z on the other side: the rotation component will abruptly turn by 180 degrees. As I said before it's impossible to make a choice that doesn't have this problem of erratic behavior (the problem we're trying to solve here can be easily reduced to one that is proven unsolvable by the hairy ball theorem!). This isn't my primary concern, however, applications are going to have to deal with this too -- there are stateful solutions that are possible (which we can't use), but overall this is a hard problem which we provably cannot solve and applications will find really hard to solve. Many will probably decide to live with it. My primary concern is that we're not solving the general case here, and it's probably really hard to solve the general case well. We're exposing an attribute based on a known subset of use cases, which will behave quite badly the moment the user lifts their hand and has the ray hit a wall. I'm not convinced that "objects on the floor" is so overwhelmingly in the majority of use cases for this as to justify a single So I still think that the best path forward is to spec |
@bialpio could you open a PR for standardizing your proposed choice of transform? I think this would at least fix the part of this issue that isn't contentious -- this API needs to be standardized. One thing we should check: ARKit exposes a pose as well, does anyone know what the details behind that pose are? Is it the same choice as ARCore? |
On it!
I took a quick look at ARKit's hit test result - there seem to be 3 different ways to obtain the information about the pose out of it (all matrix-based):
The documentation does not seem to explicitly state anything about the properties of the returned matrix, which is not ideal. I might be missing something here though. |
A hit test result is fundamentally a ray with the origin being the point of intersection, and direction being perpendicular to the plane being intersected with. This is how, for example, Unity represents it.
We're representing it as an XRRigidTransform/XRPose that produces such a ray, i.e. an XRRigidTransform which is capable of transforming the y-axis ray to the desired ray. However, such a transform is not unique (it can put the x and z transforms anywhere), and it seems weird to provide this information in such a form. It would be better if we directly produced a ray -- it's always possible to get a rigid transform out of a ray if desired.
ARCore defines it more rigidly, and the tests follow this definition. Note that the only thing actually dependent on the planes is the intersection point and normal, everything else can be computed given the point, normal, and presupplied input ray.
To me the use of poses/rigid transforms here doesn't make sense, it's fundamentally not a pose. We can attach additional made-up axes to it (the way ARCore does) to make it into a pose, but I don't see the practical use of doing this. The user will likely wish to consume this information in ray form as well, I mostly see users using the position, and if they actually use the orientation it would be to obtain a direction vector.
I suppose using it as a pose can be useful for positioning little indicator objects on the place the hit test actually occurred, but as mentioned before the user has enough information to calculate this for themselves, and this doesn't convince me that an arbitrary choice of X and Z axes is useful.
The status quo is that the tests don't match the API as specced, and that must be fixed one way or the other at the very least.
I see three paths forward for this:
getPose()
intogetRay()
: IMO this is ideologically the best option, however it's a major breaking change for a somewhat-shipped API and probably practically the worst option.Thoughts? Is there a purpose for the result being a full pose?
cc @bialpio @toji
(also cc @thetuvix to inform OpenXR hit testing API choices)
The text was updated successfully, but these errors were encountered: