Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor VZV code to center on (and Deref to) VarZeroSlice; make VZVBorrowed private #1418

Merged
merged 19 commits into from
Dec 20, 2021

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Dec 19, 2021

Fixes #1367

VZV and VZVOowned now deref to VZSlice, and most functionality is now on VZSlice. This simplifies our code greatly.

VZVBorrowed has been renamed to VZVComponents and is now crate-private; it still needs to exist to hold all of the unsafe parsed representation code.

As mentioned in #1367 (comment) , there was no appreciable difference in the benchmarks.

A thing I didn't do here, but could have, is marking the varzerovec module as private and reexporting VZVOwned up top. Right now we have ZV, VZV, ZS, VZS up top which is nice, but now that there's only one other type in varzerovec it might be worth reexporting it too so that there's no extra module. It doesn't sound like a huge deal either way.

@Manishearth Manishearth removed the request for review from zbraniecki December 19, 2021 11:35
@sffc
Copy link
Member

sffc commented Dec 20, 2021

I think one of my PRs caused you merge conflicts ☹️

@Manishearth
Copy link
Member Author

Argh, you named get_encoded_slice to as_encoded_bytes, I named it to as_bytes, this is going to be a mess

@Manishearth
Copy link
Member Author

Manishearth commented Dec 20, 2021

Do you think we should stick ot as_bytes here or move it over to as_encoded_bytes?

@Manishearth
Copy link
Member Author

I'm going to go ahead with the rename in this PR because it does more and does it consistently, can undo in a subsequent commit

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not approving or requesting changes yet, but opening some discussion threads.

#[inline]
pub fn as_borrowed<'a>(&'a self) -> VarZeroVecBorrowed<'a, T> {
pub(crate) fn as_components<'a>(&'a self) -> VarZeroVecComponents<'a, T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: This is a hot function, and I'm unsure about the code size implications. If everything gets super inlined, the VarZeroVecComponents need not ever actually exist, and overall code size might co down. But if these functions are never inlined, then we can do more code sharing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to CI:

Before:

-rwxr-xr-x 1 runner docker    59664 Dec 20 02:10 optim5.elf

After:

-rwxr-xr-x 1 runner docker    58528 Dec 20 03:16 optim5.elf

So this PR actually reduces code size overall. I'm happy to investigate future improvements later then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it reduces codesize by sharing all the codepaths better I think

pub fn len(&self) -> usize {
self.as_borrowed().len()
self.as_components().len()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: I was thinking that we would write custom code for each of these functions. To get the length, we just read the first 4 bytes. In theory, the compiler should figure that out, iff everything is inlined and DCE works fully. But I don't know if I want to rely on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like that all the implementation code is in the components module: Perhaps as a followup we can add helper methods there that do partial parsing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like components::len_of_buffer and components::indices_of_buffer(len)

Copy link
Member

@sffc sffc Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work for me.

Fine for follow up.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • utils/zerovec/benches/vzv.rs is different
  • utils/zerovec/src/ule/custom/encode.rs is different
  • utils/zerovec/src/varzerovec/mod.rs is different
  • utils/zerovec/src/varzerovec/owned.rs is different
  • utils/zerovec/src/varzerovec/slice.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@sffc
Copy link
Member

sffc commented Dec 20, 2021

Merging this on behalf of @Manishearth so it doesn't bitrot again

@sffc sffc merged commit 66b2081 into unicode-org:main Dec 20, 2021
@Manishearth Manishearth deleted the varzeroslice branch December 21, 2021 01:52
@Manishearth Manishearth mentioned this pull request Dec 21, 2021
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename [Var]ZeroVecULE to [Var]ZeroSlice, use Deref
2 participants