Skip to content

Coordinate systems #416

Closed
Closed
@Richard-Yunchao

Description

@Richard-Yunchao

Introduction

There are many coordinate systems in native graphics APIs and WebGL. And the coordinate systems may differ in some aspects. For example, the direction of +y axis may be up or down. And the origin(0, 0) or point(-1, -1) may be at the top-left or bottom-left corner. The differences make developers quite confusing, and require y-flip and/or invert winding direction for culling, in order to make applications rendered correctly and behaved consistently.

This document attempts to summarize all of these coordinate systems and propose appropriate solution for WebGPU. This document is from the Google doc I wrote.

Coordinate Systems

There are 3 coordinate systems needed to be taken into consideration in Graphics pipeline for WebGPU:
1) NDC (Normalized Device Coordinate): this coordinates is used by developers to construct their geometries and transform the geometries in vertex shader via model and view matrices. Point(-1, -1) in NDC is located at either the top-left corner (Y down) or the bottom left corner (Y up).

Normalized vertices (position, normal, transform matrices) and clip coordinate follow NDC.

2) Framebuffer Coordinate (Viewport coordinate): when we write into attachment or read from attachment or copy/blit between attachments, we use framebuffer coordiante to specify the location. The origin(0, 0) is located at either the top-left corner (Y down) or the bottom left corner (Y up).

Viewport coordinate and fragment/pixel coordinate (like gl_FragCoord by default, you can configure it though) follow framebuffer coordinate.

3) Texture Coordinate: when we upload texture into memory or sample from texture, we use texture coordinate. The origin(0, 0) is located at the top-left corner (Y down, upside down) or the bottom left corner (Y up, right-side up).

We can divide texture coordinate into texture uploading coordinate which defines where texels are stored into texture memory (the lowest memory stores the top left texel or the bottom left texel), and texture sampling coordinate which defines where we sample/read from memory. For example, the API part to upload data into texture via texImage2D might be written by one developer while the shader part to sample from texture might be written by another developer. They may have different concepts of texture coordinates. But these two coordinates are the same in a specified graphics API. These two developers should follow the specified API. Then it's fine. What is interesting is that if both of them make mistakes (both of the texture uploading coordinate and texture sampling coordinate are incorrect) for the specified API, you can fetch correct texel!

A few more coordinate systems in graphics are listed below:
1) Tessellation coordinate: we won’t discuss this because WebGPU can’t support tessellation currently.
2) Window coordinate (present coordinate): window compositor/manager, window system, and display may need to be aware of this, but it’s not WebGPU’s duty. And all window coordinates are Y down across different OSes.

OpenGL's framebuffer coordinate is different from window coordinate, so when we present the rendered image, it flips Y for display under the hood to present the result on screen. It might hurt performance because it add an extra pipeline to flip Y in fragment shader for every pixel when present onto screen.
I tend to propose Y down in framebuffer coordinate, which is aligned with window coordinate. So the proposals don't have the performance issue when presenting onto screen in WebGPU's implementations and in D3D/Vulkan/Metal drivers (OpenGL is different, but it is the least important backend for WebGPU).

3) Canvas coordinate, we need to consider this when we implement WebGPU in browsers. But this is not in WebGPU's rendering pipeline. And canvas coordinate is Y down, which is aligned with framebuffer coordinates in my proposals.

Native APIs

OpenGL, OpenGL ES and WebGL

NDC: +Y is up. Point(-1, -1) is at the bottom left corner.
Framebuffer coordinate: +Y is up. Origin(0, 0) is at the bottom left corner.
Texture coordinate: +Y is up. Origin(0, 0) is at the bottom left corner. See OpenGL 4.6 spec, Figure 8.4

D3D12 and Metal

NDC: +Y is up. Point(-1, -1) is at the bottom left corner
Framebuffer coordinate: +Y is down. Origin(0, 0) is at the top left corner
Texture coordinate: +Y is down. Origin(0, 0) is at the top left corner.

Vulkan

NDC: +Y is down. Point(-1, -1) is at the top left corner. See the description about “PointCoord” in Vulkan 1.1 spec.
Framebuffer coordinate: +Y is down. Origin(0, 0) is at the top left corner. See the description about “VkViewport” and “FragCoord” in Vulkan 1.1 spec. But we can flip the viewport coordinate via a negative viewport height value.
Texture coordinate: +Y is down. Origin(0, 0) is at the top left corner.

Possible solutions for WebGPU

Factors we need to consider

When we propose solution of coordinate systems for WebGPU, we need to consider the impact of these factors:
WebGPU developers: if web developers don't need to flip Y or invert winding direction in a particular solution, then that solution is better.
Implementation: we need to consider the implementation and its performance impact on different native graphics APIs: D3D12, Metal, Vulkan (and maybe OpenGL).
WebGL compatibility: it’s impossible to run WebGL applications via WebGPU runtime directly because the APIs are quite different. But we can reuse as many resources as we can for porting WebGL apps to WebGPU. It's better to propose a solution in which we can reuse the vertex data, shaders and textures from WebGL with no change or small changes when we port WebGL application to WebGPU.

Possible solutions

Let's discuss the coordinates in reverse order in pipeline:
1) Texture coordinate: Y is down in all three modern APIs (D3D12, Metal and Vulkan), although Y is up in WebGL and OpenGL. So I propose that y is down in texture coordinate. And I was told that the group has already discussed texture coordinate at #157 before, and the consensus is the same (Y down).
2) Framebuffer coordinate: Y is down in all three modern APIs (D3D12, Metal and Vulkan), although Y is up in WebGL and OpenGL. So I propose that y is down in framebuffer coordinate. In addition, we tend to choose Y down in texture coordinate, so it's better to make framebuffer coordinate be aligned with texture coordinate. Otherwise, when we sample from a rendered texture, which means that we use texture coordinate to read from the texture but we just rendered it via framebuffer coordinate, it might be inconvenient for WebGPU developers if these two coordinates are different.

Y down in FB coordinate will make it easier for WebGPU to render into a Canvas. Because it is Y down in Canvas. They are aligned. Flipping Y is not needed when we draw or composite image in FB to a canvas.
Y down in FB coordinate also makes it easier for WebGPU to present on the screen. Because it is Y down in window coordinate. They are aligned. Flipping Y is not needed when we present the image in FB onto screen.

3) NDC: Y is up in D3D12 and Metal (and WebGL and OpenGL), but y is down in Vulkan. I propose that Y up in NDC for WebGPU because two of three modern APIs and WebGL applications follow this way. Y down is a possible solution, though.

So there are two possible solutions for WebGPU's coordinate systems:

  1. Y up in NDC, Y down in other coordinate systems
  2. Y down in all coordinate systems.

Let's take a look at the the implementation on top of native graphics APIs and the impact of these two solutions (say, for porting WebGL apps).

Implementation and impact

Solution 1: Y up in NDC, Y down in other coordinate systems, see Dawn patch 10201

D3D12 and Metal: When we implement it on D3D12 and Metal, we need to do nothing.
Vulkan: When we implement it on Vulkan, we need to flip Y.
OpenGL: When we implement it on OpenGL, we need to flip Y in vertex shader and invert winding direction.

For this solution, we need to map (-1, 1) at top left corner in NDC to (0, 0) at top left corner in framebuffer coordinate, which might be mathematically incorrect. It is natural to map the smallest value, point (-1, -1), in NDC to smallest value in Framebuffer coordinate.
The upside is that when we port WebGL application, we can reuse the vertex data, and vertex shader directly.

Solution 2: Y down in all coordinate systems, see Dawn patch 8420

OpenGL: When we implement it on OpenGL, we need to invert winding direction.
D3D12 and Metal: when we implement it on D3D12 and Metal, we need to flip Y in vertex shader.
Vulkan: when we implement it on Vulkan, we need to do nothing.

For this solution, we map (-1, -1) in NDC to (0, 0) in framebuffer coordinate, which is mathematically correct. Furthermore, +Y is down in all coordinate systems. The consistency is pretty good for WebGPU developers. WebGL also behave consistently on coordinate systems for developers, it is y up in all coordinate systems though.
However, when we port WebGL application to WebGPU, we need to flip Y in WebGL's vertex shader and invert winding direction in WebGL applications, because the NDCs are different in WebGL and WebGPU. Otherwise, the shape of geometries rendered on screen might be incorrect.

Implementation Details

How to invert winding direction

Changing winding direction from CW/CCW to CCW/CW in graphics API is simple. And we can get the info that whether a geometry is front facing or back facing via gl_FrontFacing (but we can't change its value) in OpenGL's shader.

How to flip Y

We can flip Y by a few means:
1) Operate the coordinate values directly in shader.
In vertex shader, we can negate gl_Position.y
gl_Position.y = -gl_Position.y
In fragment shader, we can assign 1 - texCoord.y to texCoord.y
texCoord.y = 1 - texCoord.y
And we can do this for framebuffer coordinate via FragCoord.

2) Using tools like spriv-cross and/or spirv/OpenGL semantics. It has the ability to flip Y for you in spirv-corss, or specify the origin is located at top left or bottom left in shader for you, or via glClipControl API.

3) Set appropriate viewport rect. We can change viewport rect in Vulkan 1.1 and Vulkan 1.0 with VK_KHR_maintance1 support as follows.

    Viewport vp = originalViewport;  
    vp.y = orignalViewport.y + originalViewport.height;
    vp.height = -originalViewport.height;

This is good because changing the viewport rect is quite easy and we don't need to revise the shader. Sometimes developers would like to change API only. They don't want to revise shader because shader is a private asset and we can get the assembly code (like spirv) only or even binary code only.

Proposal

According to the investigation, solution 1 (Y up in NDC, and Y down in all other coordinates) can be supported on more native graphics APIs (D3D12 and Metal) without any change. Furthermore, it can be supported on Vulkan with simple change at API side only. In addition, it's more friendly to reuse WebGL's resources. So, I tend to propose solution 1.

One more issue: Z range in NDC

Z range is [-1, -1] in WebGL and OpenGL. But it is [0, 1] in D3D12, Metal and Vulkan. So I propose that Z range to be [0, 1] to follow modern APIs.

Q/A about coordinate systems

I discussed with Intel driver team about a few questions related to coordinate systems. I'd like to list them as follows. Hope that I didn't misunderstand them.

  1. When do we do culling? I saw a statement that says all APIs agree on the definition of a clockwise or counterclockwise polygon - this decision is made in framebuffer coordinates as a human would see it if presented to the screen. Is this statement true? Is winding direction (clockwise and counterclockwise) based on NDC or framebuffer coordinate? Previously, I thought CW and CCW is defined in NDC, and backface culling is done after vertex shading but before viewport projection. So, framebuffer coordinate has not been calculated yet in pipeline. So I thought CW and CCW is defined in NDC.
    A: Culling is done after viewport projection but right before rasterization. Because the viewport projection might impact geometry winding (winding might be different from different perspectives). So, culling is done on the basis of FB coordinate. In addition, we have extension/core feature (like VK_KHR_maintenance1) to flip Y via setting negative height for viewport rect in OpenGL/Vulkan, and culling is done after viewport projection, so you need to take care of that. Because flip Y will change the winding direction.

  2. How do we map points in NDC to points in framebuffer coordinate? I thought that the smallest value in NDC - point(-1, -1) - will be mapped to the smallest value - the origin(0, 0) - in framebuffer coordinate. And it is true for Vulkan and OpenGL. However, it is not true for D3D. Is this weird for D3D? I mean, developers might be happier if we always map the smallest value in one coordinate to the smallest value in another coordinate, like FB coordinate, texture coordinate, screen coordinate, etc.
    A: In Vulkan and OpenGL, we map (-1, -1) in NDC to (0, 0) in FB coordinate because Y axis in NDC and FB coordinate are the same (both down or up). This behavior looks more mathematically correct. Mapping (-1, 1) in NDC to (0, 0) in FB coordinate in D3D/Metal seems mathematically incorrect. But if developers get used to it. It’s not a big problem. And the fact is that game engines and 3D modeling tools follow D3D's coordinates to generate meshes for geometries, so developers don't matter it at all. They might think mapping (-1, 1) in NDC to (0, 0) in FB coordinate is natural.

  3. Shall we follow D3D/Metal's coordinate systems or Vulkan's coordinate systems for WebGPU? Looks like the only difference is NDC is y-up or y-down.
    A: I might say that it is good to follow D3D's coordinate systems, because many game engines target D3D first, and port to other platforms after the fact. Different coordinate systems can lead to portability problems like flip Y in existing apps. On the other hand, Following Vulkan's coordinate is not bad if WebGPU looks into future because Y down might be the trend. You see, more and more coordinate systems (NDC, FB coordinate, texture coordinate, window coordinate) are Y down. Being consistent on all coordinate systems is very good for developers, too.

  4. What about the performance impact on hardware?
    A: AFAIK, the performance impact is not a big deal because we only need to flip Y or invert winding direction. I don't have performance data, though.

  5. What about window coordinate (present coordinate)? It seems to me that every window manager or window system uses +Y = down. I mean, Y is down in all window coordinate (or present coordinate) and origin(0, 0) is located at the top left corner across different OSes. Is this correct?
    A: AFAIK, it is correct.

References

  1. Vulkan spec
  2. OpenGL spec
  3. Coordinate systems in MSDN
  4. Working with Viewport and Pixel Coordinate Systems in Metal Programming Guide
  5. Keeping the Blue Side Up: Coordinate Conventions for OpenGL, Metal and Vulkan
  6. Flipping the Vulkan viewport

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions