Skip to content

Clarify that default real- or complex-valued floating-point data types may depend on the target device capabilities #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oleksandr-pavlyk opened this issue Nov 15, 2022 · 10 comments · Fixed by #515
Milestone

Comments

@oleksandr-pavlyk
Copy link
Contributor

oleksandr-pavlyk commented Nov 15, 2022

In description of "Default Data Types" in "data_types.rst" we state:

The default real-valued floating-point and complex floating-point data types must be the same across platforms.

Since HW support for floating point data-types varies across devices, and can be queried using dev.has_aspect_fp64 (ref) and dev.has_aspect_fp16 (ref), the real-valued floating-point and complex floating-point data types must either always be "float32" or depend on the capability of the device.

I think having HW capability-tailored default floating-point types is more useful.

This issue is to discuss changing the verbiage in "data_types.rst". From SYCL perspective, it would be appropriate to say

The default real-valued floating-point and complex floating-point data types may depend on capabilities of the target device but must be the same across host platforms.

Here host platform is the platform (Windows/Linux/Mac, etc) of the host portion of SYCL program, i.e. platform of the Python interpreter.

@rgommers
Copy link
Member

@oleksandr-pavlyk isn't that a dangerous design choice for your library? In a typical workflow, I imagine that users develop on a standard development machine (Linux/Windows/macOS), and then once things work move on to target an accelerator device. If at that point you get a silent precision change, that can easily lead to problems.

In practice, don't you choose float32 as the default, since it works everywhere?

@oleksandr-pavlyk
Copy link
Contributor Author

@rgommers If a user has a choice of devices, he/she should select the device that supports double precision if this is important for the application by using, e.g. dpctl.select_device_with_aspects('fp64'):

In [4]: dpctl.select_device_with_aspects([])
Out[4]: <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu,  Intel(R) Graphics [0x9a49]] at 0x7f6be7cfea30>

In [5]: dpctl.select_device_with_aspects("fp64")
Out[5]: <dpctl.SyclDevice [backend_type.opencl, device_type.cpu,  11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz] at 0x7f6bcd6067f0>

The first one does not support double precision floating point type, while the second one does.

Intel Max Series of GPUs support double precision floating point types, so in not too distant future this issue is not going to be a problem.

@rgommers
Copy link
Member

If a user has a choice of devices, he/she should select the device that supports double precision if this is important for the application

I guess that's asking quite a bit from a user. In my experience they often won't know if that is the case. If they consider the question at all. I suspect it's a footgun, where one offloads and is happy with faster compute, and then silently gets results with low accuracy.

The default real-valued floating-point and complex floating-point data types may depend on capabilities of the target device but must be the same across host platforms.

Rather than updating the text, maybe we could turn this into a .. note:: box? Something like "Note that it is possible that a library supports multiple device types, and not all those device types support the same data types. In this case, the default integer or floating-point data type may vary with device type. The library should clearly warn about this in its documentation."

@oleksandr-pavlyk
Copy link
Contributor Author

I am ok with adding of the note box. I would change the "device type" verbiage to "device". A consumer laptop may have an integrated Iris Xe which does not support double precision floating-point type and a dedicated GPU card which does. Both devices are of type GPU but have different HW features.

@rgommers rgommers added this to the v2022 milestone Nov 16, 2022
@seberg
Copy link
Contributor

seberg commented Nov 17, 2022

Since we discussed it in the meeting: It also seems a bit strange to me to have dtype depend on implicit settings (i.e. device not dtype itself). But if that is the choice you have, then adding the note seems good.
For libraries, I would think they should use input arguments to decide used (rather than using the default dtype).

@leofang
Copy link
Contributor

leofang commented Nov 20, 2022

I missed the meeting, is the conclusion to add the suggested note? In this case, it seems we suddenly jump from "default dtypes per library" to "default dtypes per device per library"? I guess I better watch the recording...

@rgommers
Copy link
Member

I missed the meeting, is the conclusion to add the suggested note?

Yes, that seems reasonable.

In this case, it seems we suddenly jump from "default dtypes per library" to "default dtypes per device per library"?

That's not how I would characterize that note. The spec requires "default dtypes per library", and the note mainly acknowledges that if device capabilities differ a lot, then multi-device libraries may have to deviate (and if they do, they should very clearly document that).

I guess I better watch the recording...

There wasn't much discussion on this topic.

@oleksandr-pavlyk
Copy link
Contributor Author

It also seems a bit strange to me to have dtype depend on implicit settings (i.e. device not dtype itself).

Any explicit user dtype settings are honored, or raise an error if the device can not support it. The whole discussions is about behavior of default dtype value of None, which per documentation should be a default floating point type. Without the suggested verbiage, the default dtype setting would lead to an error when targeting devices that do not support doubles (not a good design).

@leofang
Copy link
Contributor

leofang commented Nov 22, 2022

I see, thanks for clarification guys!

@rgommers
Copy link
Member

I opened a PR for this: gh-515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants