Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct copy specialization #65

Merged
merged 16 commits into from
Apr 24, 2024
Merged

Conversation

james7132
Copy link
Contributor

@james7132 james7132 commented Mar 2, 2024

Mostly resolves the issues found in #53. This PR specializes writes, reads, and creates of types by directly using slice::copy_from_slice on them if compiling for a little endian target, and only if the type has no internal padding.

Checking the generated assembly, this eliminates a huge number of branches and even starts using vectorized copies and calls to memcpy directly for larger types.

@james7132 james7132 marked this pull request as ready for review March 23, 2024 05:35
pcwalton added a commit to pcwalton/bevy that referenced this pull request Mar 23, 2024
`StorageBuffer`, and make `GpuArrayBuffer` use it.

`EncasedBufferVec` is like `BufferVec`, but it doesn't require that the
type be `Pod`. Alternately, it's like `StorageBuffer<Vec<T>>`, except it
doesn't allow CPU access to the data after it's been pushed.
`GpuArrayBuffer` already doesn't allow CPU access to the data, so
switching it to use `EncasedBufferVec` doesn't regress any functionality
and offers higher performance.

Shutting off CPU access eliminates the need to copy to a scratch buffer,
which results in significantly higher performance. *Note that this needs
teoxoy/encase#65 from @james7132 to achieve
end-to-end performance benefits*, because `encase` is rather slow at
encoding data without that patch, swamping the benefits of avoiding the
copy. With that patch applied, and `#[inline]` added to `encase`'s
`derive` implementation of `write_into` on structs, this results in a
*16% overall speedup on `many_cubes --no-frustum-culling`*.

I've verified that the generated code is now close to optimal. The only
reasonable potential improvement that I see is to eliminate the zeroing
in `push`. This requires unsafe code, however, so I'd prefer to leave
that to a followup.
@james7132 james7132 force-pushed the direct-copy-specialization branch from 2901ec1 to 4ee8377 Compare March 28, 2024 04:12
@james7132
Copy link
Contributor Author

I think this is about as clean/low-unsafe as I can get this without negatively impacting the performance again. @teoxoy this should be ready for review now.

@teoxoy
Copy link
Owner

teoxoy commented Apr 2, 2024

@james7132 I pushed a commit to the PR could you give it a look?

@james7132
Copy link
Contributor Author

I actually tried to deduplicate code like this, but apparently the optimizer doesn't like implicit if-returns as replacements for if/elses. I'll check the assembly output of this later.

@james7132
Copy link
Contributor Author

I think the Pod changes are fine, but the cleanup in how the branching is handled the compiler really doesn't like: james7132/encase_asm_tests@f936005#diff-ce2d3c1fb388fe83071754963a5343392f61ff8b7d7465aa6d2ba68dfe31993b.

This shows a regression in codegen for Vec-like writes. Though I think this comes down to the fact that it cannot inline the reservation, and thus it's assertion that there's enough space does not carry over, unlike the slice writes.

@james7132
Copy link
Contributor Author

but apparently the optimizer doesn't like implicit if-returns as replacements for if/elses. I'll check the assembly output of this later.

Ah I think I see why, it views the non-memcpy writes/reads as a potential side effect, and thus cannot treat the return as a replacement for an else to the if.

@teoxoy
Copy link
Owner

teoxoy commented Apr 14, 2024

@james7132 I pushed another commit to address the issue. Let me know what you think!

@james7132
Copy link
Contributor Author

Doesn't seem to make a tangible difference: james7132/encase_asm_tests@4399891. Might be due to one of the is_pod changes. It might be better to just merge this as is and close the gap in a later PR.

@teoxoy teoxoy merged commit 83e744a into teoxoy:main Apr 24, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants