Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move out the list of static strings? #22

Closed
SimonSapin opened this issue Oct 2, 2014 · 15 comments · Fixed by #178
Closed

Move out the list of static strings? #22

SimonSapin opened this issue Oct 2, 2014 · 15 comments · Fixed by #178

Comments

@SimonSapin
Copy link
Member

Currently, this repository contains an hard-coded list of static strings that is tailored for Servo. Would it be possible to move this list out of string-cache so that Servo and other potential users can maintain their own version of it? Perhaps by making Atom generic over a type with a trait that provides the list as an associated static item.

@glennw
Copy link
Member

glennw commented Oct 2, 2014

That's definitely the plan - @kmcallister and I have discussed briefly, he may have some thoughts on how this could be done too.

@gsingh93
Copy link

I'd really like to see this implemented, or at least hear your thoughts about how it could be done.

@SimonSapin
Copy link
Member Author

Here is the plan: every atom will be of a certain "kind" (which will be part of its type), each kind will have its own static set (and therefore atoms of different kinds representing different strings may have the same internal representation), and only atoms of the same kind are comparable. This will generalize the type safety of the current struct Namespace(Atom); wrapper, which will become just another "kind".

#[derive(Hash, PartialEq, Eq)]
pub struct Atom<K=DefaultKind> where K: Kind {
    data: u64,
    kind: PhantomData<K>,
}

pub trait Kind {
    fn str_to_id(s: &str) -> Option<u32>;

    /// Should panic for unknown IDs
    fn id_to_str(id: u32) -> &'static str;
}

pub enum DefaultKind {}
impl Kind for DefaultKind {
    #[inline] fn str_to_id(_: &str) -> Option<u32> { None }
    #[inline] fn id_to_str(_: u32) -> &'static str { panic!("bad static atom") }
}

A string_cache_codegen, modeled after phf_codegen, will help users in their build scripts to generate code for:

  • A static phf::OrderedSet<&'static str>
  • A Kind impl based on it
  • A type Foo = Atom<FooKind>; type alias
  • A macro_rules! foo in the style of Run on Rust stable #95

This will be the only way to get static atoms. The compiler plugin will be removed.

html5ever will define various "kinds" for element names, attribute names, namespace URLs, etc.

Servo’s style crate will be able to generate one (always up-to-date) for CSS property names.

How does this sound? @glennw, @jdm, @Ms2ger, @mbrubeck, @Manishearth

@jdm
Copy link
Member

jdm commented Jul 24, 2015

Sounds great to me!

@SimonSapin
Copy link
Member Author

Alternative:

pub trait Kind {
    fn static_atom_set() -> &'static phf::OrderedSet<&'static str>;
}

The set for DefaultKind would have to contain something like the empty string, since phf divides by zero when one tries to make an empty OrderedSet.

@Ms2ger
Copy link
Contributor

Ms2ger commented Jul 25, 2015

I don't know enough about how string-cache works, but this seems sensible enough.

@SimonSapin
Copy link
Member Author

One thing is I don’t know where things like #83 would fit. Many separate "kinds"?

@aidanhs
Copy link
Contributor

aidanhs commented Jan 29, 2016

I've implemented #22 (comment) (created a new crate, made necessary changes to this crate, got all the tests working and kept it backwards compatible). Unfortunately it requires the default_type_parameter_fallback feature gate, as it's waiting on the complete implementation of RFC 213 (rust-lang/rust#27336). Briefly, without this being implemented you cannot do Atom::from("blah") since the ommitted <DefaultKind> will not be inserted. As soon as it ends up in stable I'll submit my PR.

Note that "backwards compatible" means "keeps the static atoms in the crate for now, so servo can upgrade then move to the new system at leisure".

Since there's a delay, I have a question for the eventual submission - where will the new string_cache_codegen crate live? PHF keeps associated crates in subfolders of the same git repo, but doing that now seems like it'd be painful for git blame (and code review), so a new repo like servo/string-cache-codegen might be better?

@SimonSapin
Copy link
Member Author

Breaking backward-compatibility of a library is not necessarily a problem for Servo, which can keep using an old version for a while.

Thanks for working on this, but even though I proposed it, I’m not so sure now this design is such a good idea. Atoms of different kinds won’t be comparable, and having to specify Atom::<Html5Ever>::from instead of Atom::from seems to be an ergonomic loss.

On the other hand, centralizing a list of strings is theoretically not nice but it’s not such a problem in practice.

Potential users of this library should feel free to send PRs to add static strings. The only marginal cost is the size of compiled binaries: the strings themselves + a few integers each. (The run-time cost of hash table lookups is O(1).)

@aidanhs
Copy link
Contributor

aidanhs commented Jan 29, 2016

The backwards compatibility was more about being able to upgrade, then being able to incrementally move things over (rather than one huge servo PR).

Atoms of different kinds won’t be comparable

This seems like a good thing to me. If you've got multiple static sets of strings, trying to compare them will give incorrect results so you actually want the typechecker to help.

having to specify Atom::<Html5Ever>::from instead of Atom::from seems to be an ergonomic loss

You could just use the type alias for custom atom types, e.g. ServoAtom::from.

I think I've thought of a way to work around the lack of RFC, so I'll send a PR 'shortly'.

@Ms2ger
Copy link
Contributor

Ms2ger commented Jul 13, 2016

CC @bholley

It would be nice to have the static string list in servo/servo, but would that not make it impossible to use static atoms in servo's dependencies? xml5ever and rust-selectors use a few, and html5ever rather more.

@SimonSapin
Copy link
Member Author

The idea is that each crate could define one (or more) atom kind(s), each associated with a set of static strings. Not everything would be in servo/servo.

@Ms2ger
Copy link
Contributor

Ms2ger commented Jul 13, 2016

So when html5ever creates an element, it would create one of its atoms, which we'd need to convert to some other kind to match selectors against it?

@SimonSapin
Copy link
Member Author

That’s a good point. We might have wanted an htmlatoms crate that selectors would have depended on, to avoid this conversion. But I’m also working on making selectors generic over the string/atom types in order to support Gecko atoms without a Cargo override/replacement. See thread starting at servo/servo#12391 (comment).

@aidanhs
Copy link
Contributor

aidanhs commented Jul 13, 2016

We might have wanted an htmlatoms crate that selectors would have depended on, to avoid this conversion.

Yes, I'd expect all different 'kinds' of atoms to define their own list of atoms, and anything that needs sharing between multiple crates would be expressed as a common dependency. A bit like when defining enums you don't create one type to hold every possible variant regardless of what's it's used for (since you're subverting the type system) - instead, you create multiple types with variants split in some logical way.

I'm not sure how you'd enforce that there's only one copy of this dependency and that every depending crate uses this one copy (the same problem would exist with enums as well)...I'd expect Cargo to have some way of helping here. On reading through that linked issue, perhaps [replace] is appropriate.

Edit: in fact, you already have to deal with the above problem with string-cache

SimonSapin added a commit that referenced this issue Oct 25, 2016
It’s going into various atom types in html5ever and Servo.
SimonSapin added a commit that referenced this issue Oct 25, 2016
It’s going into various atom types in html5ever and Servo.
SimonSapin added a commit that referenced this issue Oct 27, 2016
It’s going into various atom types in html5ever and Servo.
SimonSapin added a commit that referenced this issue Nov 1, 2016
It’s going into various atom types in html5ever and Servo.
bors-servo pushed a commit that referenced this issue Nov 2, 2016
Make Atom generic over the set of static strings

The new `string_cache_codegen` crate is intended to be used by downstream users in their build scripts. Provided a list of static strings, it generates code that defines:
- A type alias for `Atom<_>` with the type parameter set,
- A macro like the former `atom!` macro.

Based on #136, original work by @aidanhs.

Fixes #22, fixes #136, closes #83.

r? @nox

<!-- Reviewable:start -->
---

This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/string-cache/178)

<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants