Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Atom generic over the set of static strings #178

Merged
merged 19 commits into from
Nov 2, 2016
Merged

Conversation

SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Oct 27, 2016

The new string_cache_codegen crate is intended to be used by downstream users in their build scripts. Provided a list of static strings, it generates code that defines:

  • A type alias for Atom<_> with the type parameter set,
  • A macro like the former atom! macro.

Based on #136, original work by @aidanhs.

Fixes #22, fixes #136, closes #83.

r? @nox


This change is Reviewable

@aidanhs
Copy link
Contributor

aidanhs commented Oct 27, 2016

Oh boy, I'm excited! No worries about taking so long, I understand it can take a while for the stars to align and get time. Some thoughts:

  1. the readme for string-cache-codegen is no longer present. Although there are docs, I tend to think it's nice to have something in a github readme as well (potentially even at the top level)
  2. the 'test' crate in a subfolder of string-cache-codegen is no longer present. I can't see string-cache-codegen ever breaking, but the test seems like a useful thing to have
  3. good catch on the 'hashing twice' thing. It's been a while since I wrote the code, but I think the trade-off is that you can either hash once and end up storing the strings for identical dynamic atoms twice (because the hash is based off the static atom set hash), or you can hash twice (the second time with the fixed 'dynamic set' key) and be guaranteed to only store each string once. On reflection, given that applications are likely only going to have one kind of atom the latter is not really a big win.

I'll actually try it out a bit later.

@SimonSapin
Copy link
Member Author

SimonSapin commented Oct 27, 2016

Oh, good point about using different hash functions in the one dynamic atom hash table. I had not thought of that.

So now I'm not sure it's the right thing to do :) I'll think on it a bit more.

@SimonSapin
Copy link
Member Author

After chatting with @nox I think we’re gonna keep things as-is now regarding hashing. We can always change it later (without even breaking any public API).

Regarding testing, the string_cache crate’s own build.rs uses string_cache_codegen to generate a TestAtom, which is used in unit tests. Since this matches the expected usage of the codegen crate, I believe a separate testing crate is not needed.

I’ve pushed another commit that extends the README with some usage examples.

@aidanhs
Copy link
Contributor

aidanhs commented Oct 28, 2016

Ok, I've tried it out and it seems to work fine for me and the examples look good to me. It's a little sad it can't use phf generated sets directly, but I can see that's tricky given the desire to want to have direct access to the hash value without recomputing. Maybe someday.

Nothing else from me I don't think 👍

@SimonSapin
Copy link
Member Author

string_cache::PhfSet is a lot like phf::Set<&'static str>, just with more stable internals. (See e.g. this warning.)

I didn’t make it expose any public functionality to keep small the set of API we’ll need to maintain and because I didn’t see a use for it, but it could. What would you like to do with the set, and why?

@aidanhs
Copy link
Contributor

aidanhs commented Oct 31, 2016

It's more a general feeling that it'd be nice to use phf_codegen (and therefore phf_sets) directly rather than recreating them. As you note, given that parts of the api are not considered public it'd be a bit risky to depend on them for functionality. Perhaps the long term solution is to negotiate with phf to expose the bits that would be useful for string-cache, but I wouldn't want that to come before merging this PR.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm. What's the plan for namespaces?

slice
string_cache_codegen::AtomType::new("atom::tests::TestAtom", "test_atom!")
.atoms(&[
"a", "b", "address", "area", "body", "font-weight", "br","html", "head", "id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after br

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

#[allow(dead_code)]
mod shared;

use string_cache::Atom;
use string_cache::DefaultAtom;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably use DefaultAtom as Atom consistently across the examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
fn from(string_to_add: Cow<'a, str>) -> Self {
let static_set = Static::get();
let hash = phf_shared::hash(&*string_to_add, static_set.key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not continue to use get_index_or_hash()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be a method of a public trait, and thus changing it (if we need to in the future) would be a breaking change.

@SimonSapin
Copy link
Member Author

The plan for namespaces is to have them in a new html5ever-atoms crate in the html5ever repo: https://github.com/servo/html5ever/compare/string-cache-up#diff-04339865c75d55a398b31d0a84d21c56R42

Namespace and Prefix there are both type aliases of Atom<SomeStaticSet> generated with string_cache_codegen. The ns!() macro (which expands to a Namespace value based on the prefix) is generated separately by the build script.

@Manishearth
Copy link
Member

r=me , unless @nox wants to have a look

@SimonSapin
Copy link
Member Author

@bors-servo r=Manishearth

@bors-servo
Copy link
Contributor

📌 Commit 4528d77 has been approved by Manishearth

@bors-servo
Copy link
Contributor

⚡ Test exempted - status

@bors-servo bors-servo merged commit 4528d77 into master Nov 2, 2016
bors-servo pushed a commit that referenced this pull request Nov 2, 2016
Make Atom generic over the set of static strings

The new `string_cache_codegen` crate is intended to be used by downstream users in their build scripts. Provided a list of static strings, it generates code that defines:
- A type alias for `Atom<_>` with the type parameter set,
- A macro like the former `atom!` macro.

Based on #136, original work by @aidanhs.

Fixes #22, fixes #136, closes #83.

r? @nox

<!-- Reviewable:start -->
---

This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/string-cache/178)

<!-- Reviewable:end -->
@SimonSapin SimonSapin deleted the static-sets branch November 2, 2016 13:23
@aidanhs
Copy link
Contributor

aidanhs commented Nov 2, 2016

🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add static atoms for input types. Move out the list of static strings?
4 participants