Skip to content

[NFC][SYCL] Cleanup device_impl's properties caching #18450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

aelovikov-intel
Copy link
Contributor

  • Cache extensions, for the following code this results in 600ns -> 42..50ns improvement. Caching fp16 aspect directly (and avoiding std::find_if over extensions vector) can bring it further down to about 35..45ns, but, IMO, it isn't worth extra complexity.
    bool a = true;

    for (int i = 0; i < 10; ++i) {
      SimpleTimer t;
      a &= d.has(sycl::aspect::fp16);
    }
  • Pre-initialize all cached data in the ctor instead of using std::call_once
  • Change device_impl::has_extension to accept std::string_view
  • Use cached data instead of re-computing in some queries
  • ...and use them instead of now-removed dedicated interfaces to access cached data

* Cache extensions
* Pre-initialize all in the ctor instead of using `std::call_once`
* Minor code cleanups

For

```
    bool a = true;

    for (int i = 0; i < 10; ++i) {
      SimpleTimer t;
      a &= d.has(sycl::aspect::fp16);
    }
```

this results in 600ns -> 42..50ns improvement. Caching fp16 aspect directly (and
avoiding `std::find_if` over extensions vector) can bring it further down to
about 35..45ns, but, IMO, it isn't worth extra complexity.
mutable ext::oneapi::experimental::architecture MDeviceArch{};
mutable std::once_flag MDeviceArchFlag;

// TODO: Does this have a race?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a race condition here? Could you please clarify what it is exactly?


// Pre-compute some often used properties.

// Is used during submission.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about kernel submission to a queue, right? I think it would be helpful to clarify this in the comment.

@aelovikov-intel aelovikov-intel marked this pull request as draft May 14, 2025 16:46
@aelovikov-intel
Copy link
Contributor Author

Alternative approach: #18476

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants