Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slice-like Pointers #30

Merged
merged 14 commits into from
Jun 12, 2024
Merged

Slice-like Pointers #30

merged 14 commits into from
Jun 12, 2024

Conversation

asmello
Copy link
Collaborator

@asmello asmello commented Mar 31, 2024

Taking a page out of the std::path book, I figured it'd be nice to have most operations using Pointers operate on string slices instead of owned strings. This significantly helps reduce allocations/copies when manipulating this type, and allows for neat things like creating static/const Pointers that can be parsed at compilation time.

The handful of operations requiring &mut have been move to a companion buffer type called PointerBuf, named analogously to std::path::PathBuf. Also to be consistent with PathBuf, it implements Deref<Target=Pointer> so all pointer operations also work on this owned type transparently, and for the most part it can be used where the original Pointer type would.

I did make a few API breaks though. I'm open to iterating on the breaks, and maybe make some compatibility compromises, as not all changes are necessary to support this PR, but I figured this was a good opportunity to more closely align the API to std conventions and make things less surprising / more intuitive.

Since a lot of code was rewritten/simplified to take advantage of immutability and avoid re-allocations, I've also added a few quickcheck tests to generate some confidence that the new implementations work as expected. The original tests have also been preserved; they were just updated to use the new APIs.

Features

  • New slice type Pointer that enables zero-copy usage patterns.
  • const fn Pointer::from_static
  • Zero-allocation Pointer::root
  • Introduced some Quickcheck tests
  • Pointer::split_front and Pointer::split_back
  • Pointer::parent
  • Pointer::strip_suffix

Breaks

  • Original Pointer renamed to PointerBuf
    • All of the original methods are still available, but some were changed:
      • Pointer::root is now just PointerBuf::new
        • NOTE: a static method on the slice type Pointer::root is also available and encouraged as a replacement where possible (as it doesn't allocate)
      • Pointer::new is now PointerBuf::from_tokens
      • Pointer:union is now PointerBuf::intersection (fixes Pointer::union is misleadingly named #28)
  • Debug implementation changed to preserve type information (e.g. prints PathBuf("/foo/bar") instead of "/foo/bar")
  • Removed impl Deref<Target=&str> (Pointer is a subtype of str, not the other way around!)

Misc

  • Fixed some warnings from clippy
  • Opportunistic optimizations/simplifications

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

@chanced just checking in given it's been a few months already! Happy to break this down into smaller parts if you're not comfortable merging it all together.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

Hey, I'm sorry. I don't know what happened with this. I didn't get a notification that this was created and so I'm just now seeing it.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

I'm going to grab some lunch right now but I'll going to look over everything when I get back.

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

Heh, no worries, and no hurry! It's obviously not a blocker for me or anything. I should've pinged way earlier, anyway. Really appreciate your attention now.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

Actually, before I head out to grab something to eat, I've been meaning to actually reach out to you @asmello.

I'm working on a JSON Schema crate that relies on this. There are some design flaws in this crate that I want to change, like the iterator not emitting a token for the base. However, they would be breaking changes. I can work around the problems while fixing them may cause y'all churn.

There's at least one or two other things which I wanted to ask you about as well.

Haven't had a chance to look at the details of this at all yet so you may have actually already submitted some of them.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

In addition to addressing the contents of this pr, I may just jot them all down as a reply here. If its easier to chat, I'm on the official rust discord with the username "chanced."

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

Sure, happy to talk about those planned changes over at Discord. But in general we're happy to accept breaking changes, especially if they're fixes in the original design.

BTW, not sure you're aware, but I've also contributed to json_patch with a dependency on this crate, which was released in version 2.0. It's a relatively popular crate. Making things better there was a motivating factor for this PR.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

Dude. This pull request is awesome.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

New slice type Pointer that enables zero-copy usage patterns.

This in particular is incredibly appreciated by me. All of it is but that is something I really, really wanted and just haven't had the time to write yet.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

I'm good with all of this.

With all of this, the only two things I know that are left are:

  • Removing the (optional) dependencies on all of the uri related crates. I haven't found them to be useful at all and just add needless weight/complexity. I ended up having to write an uri implementation(https://github.com/chanced/grill/tree/initial-version/grill-uri) and it doesn't even really make sense to include pointer methods there.
  • Fix iteration of tokens so that the root is emitted. This will mean some sort of enum that indicates whether its root or not.

The one question I had that may still be relevant is with regards to Tokens and how I had implemented it. There's needless allocation occurring because it stores both encoded and decoded. I'm not super happy about it but wasn't sure what the best path forward to fixing it would be. One thought I considered was a OnceCell that gets populated upon use.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

If y'all are using the uri extensions, i'm okay with keeping them.

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

The one question I had that may still be relevant is with regards to Tokens and how I had implemented it. There's needless allocation occurring because it stores both encoded and decoded. I'm not super happy about it but wasn't sure what the best path forward to fixing it would be. One thought I considered was a OnceCell that gets populated upon use.

Hmm yeah, I think the Token implementation can use some simplifications too. I haven't looked too closely, but I think the redundant stuff is all private to the module, so we can probably change it without changing the API much. My intuition is that we can just have Token store a reference to a sub-slice of Pointer - making it a borrowed type. Given the common use case of iteration I think that works fine. If someone wants a decoded value, it can just allocate a string on-demand then, and if someone needs an owned type that outlives the original Pointer, impl ToOwned<Owned=String> might make sense (or maybe just a Token::to_string, not really sure).

EDIT: might also make sense to provide an OwnedToken type just so we can preserve the ability to decode after turning it into an owned value. Though a static jsonptr::decode(s: &str) -> Cow<'_, str> could probably serve the same purpose.

EDIT2: we could also just make Token behave like Cow and be usable both as an owned and as a borrowed type. Not sure it's worth the added complexity but it'd be kinda neat.

We're fine with removing URI support 👍 . Shouldn't affect json_patch either.

I think we could trivially fix the root token thing here, but I feel like it's disconnected enough to warrant a separate PR. I think the same applies for the other changes you've got planned. This PR is fairly substantial as is. Thankfully it should make the other changes easier to make.

@chanced chanced merged commit 68dbfcb into chanced:main Jun 12, 2024
1 check passed
@asmello asmello deleted the wip branch June 12, 2024 18:05
@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

Yay, thanks for merging! I think I'll play around with Token and see if I come up with a PR for it. Feel free to work on the other changes and I'd be happy to review too.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

we could also just make Token behave like Cow and be usable both as an owned and as a borrowed type. Not sure it's worth the added complexity but it'd be kinda neat.

I'm using this approach heavily in my json schema crate. I really like it.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

I think the Token implementation can use some simplifications too.

Oh, I have no doubt. I would embarrass myself every time I peaked the source on this crate. It was largely my first project in rust. Code quality took a back seat to appeasing borrowck or just getting it to compile at all.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

I may rewrite the parser as part of the release this gets packaged as.

I'm really digging simple little enum-based finite automaton for parsers like this. (e.g.https://github.com/chanced/grill/blob/b54dd41981f06c5e0896ca07b73d9c4daefec03e/grill-core/src/big.rs#L331-L379) (although that may be a bad example as that parser will eventually need to get rewritten - but not because of the state machine).

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

Feel free to, it's your crate 😄

When it comes to parsers, it's always a trade-off between code legibility and performance. The more structured your parser, the less opportunities for optimizations, but the easier it is to validate it. The JSON Pointer grammar is not particularly expensive to parse, regardless of implementation, but they're also a type that tends to get created and destroyed often, so any slowness in parsing could build up at higher scales.

I'd generally suggest starting with a simpler and more obviously correct implementation, and then move to a more specialized one as necessary, but arguably we already have a specialized one, so that guidance doesn't exactly apply. So instead what I'll say is, if you want to be really confident about your choice, add benchmarks. But honestly I don't think it matters much either way.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

Feel free to, it's your crate 😄

Hah, that was mostly a note to myself but at the same time, I don't really own it per se. Releasing work under such a permissive model is akin to giving it away, of sorts. It becomes a communal project.

In fact, if you PR yourself in as an author/contributor in the Cargo.toml, I'd appreciate it. Or pass me the details and I'll add it. Same goes for you @wngr (sorry for the ping if you aren't interested).

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

re: rewriting the parser, you're right. Rewriting it now, without a more robust test suite and benchmarks to not only ensure it functions as desired but also is justifiable, is not a good idea.

Honestly the desire to do so was a knee jerk reaction to the angst of learning more people than I anticipated are reading my shoddy source :).

I'll rewrite it in due time, but not before it is provably better.

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

Releasing work under such a permissive model is akin to giving it away, of sorts. It becomes a communal project.

That's only partially true. You hold the keys to this repository and the crates.io entry, so that makes you the de facto lead of this project (not to mention you're still the main contributor/maintainer, so morally your opinion matters the most). The license means anyone can fork the code, but that'd result in a different crate.

In fact, if you PR yourself in as an author/contributor in the Cargo.toml, I'd appreciate it. Or pass me the details and I'll add it. Same goes for you @wngr (sorry for the ping if you aren't interested).

I'll submit a PR to add myself as an author then. Oliver is probably not checking emails/notifications currently, but I think we can add him too.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

That's only partially true. You hold the keys to this repository and the crates.io entry, so that makes you the de facto lead of this project (not to mention you're still the main contributor/maintainer, so morally your opinion matters the most).

Understood. I didn't mean to make it seem like I was trying to skirt responsibility or anything. I just meant that I don't want to make sweeping changes which could have adverse consequences to others without at least trying to consult with those I am aware of.

There are plenty of projects where the primary takes ownership to heart. There's nothing wrong with that approach at all. I personally don't, is all.

@asmello
Copy link
Collaborator Author

asmello commented Jun 12, 2024

Oh no, please don't take it that way. I didn't think you were trying to skirt responsibility or anything of the sort, just wanted to highlight how much agency you are entitled to here. You've built something awesome and although it makes sense to listen to others and accept contributions, the community will defer to you as the project's primary maintainer to make important decisions. It's not so much a matter of ownership but of respect and trust, that's what I meant.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

Thanks.

@chanced
Copy link
Owner

chanced commented Jun 12, 2024

@asmello is there any chance you would mind updating the CHANGELOG.md please. I'm sorry, I forgot to check for that.

@asmello asmello mentioned this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pointer::union is misleadingly named
2 participants