Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xla:cpu] Optimize ThunkExecutor::Execute part #2 #15567

Merged
merged 1 commit into from
Jul 31, 2024

Commits on Jul 31, 2024

  1. [xla:cpu] Optimize ThunkExecutor::Execute part #2

    Use std::aligned_storage_t trick to avoid default-initializing Node struct on a hot path.
    
    name                                     old cpu/op   new cpu/op   delta
    BM_SelectAndScatterF32/128/process_time   791µs ± 4%   720µs ± 2%  -8.93%
    BM_SelectAndScatterF32/256/process_time  3.20ms ± 4%  2.96ms ± 2%  -7.46%
    BM_SelectAndScatterF32/512/process_time  13.7ms ± 5%  12.8ms ± 2%  -6.80%
    
    name                                     old time/op          new time/op          delta
    BM_SelectAndScatterF32/128/process_time   790µs ± 5%           719µs ± 1%   -9.00%
    BM_SelectAndScatterF32/256/process_time  3.20ms ± 3%          2.96ms ± 1%   -7.58%
    BM_SelectAndScatterF32/512/process_time  13.2ms ± 4%          12.3ms ± 1%   -6.82%
    
    PiperOrigin-RevId: 658139935
    ezhulenev authored and copybara-github committed Jul 31, 2024
    Configuration menu
    Copy the full SHA
    595c6b2 View commit details
    Browse the repository at this point in the history