Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up ast::Attribute, ast::CrateConfig, and string interning #37824

Merged
merged 12 commits into from
Nov 21, 2016

Conversation

jseyfried
Copy link
Contributor

@jseyfried jseyfried commented Nov 17, 2016

This PR

  • removes ast::Attribute_ (changing Attribute from Spanned<Attribute_> to a struct),
  • moves a MetaItem's name from the MetaItemKind variants to a field of MetaItem,
  • avoids needlessly wrapping ast::MetaItem with P,
  • moves string interning into syntax::symbol (ast::Name is a reexport of symbol::Symbol for now),
  • replaces InternedString with Symbol in the AST, HIR, and various other places, and
  • refactors ast::CrateConfig from a Vec to a HashSet.

r? @eddyb

Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments. cc @rust-lang/compiler

@@ -185,31 +185,31 @@ pub enum DefPathData {
/// An impl
Impl,
/// Something in the type NS
TypeNs(InternedString),
TypeNs(Symbol),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these are InternedString so they hash their contents instead of the integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I got overzealous with the InternedString -> Symbol refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make a type SymbolStringContent?

@@ -717,16 +717,16 @@ impl<'a, 'tcx, 'gcx> PolyExistentialProjection<'tcx> {
self.0.item_name // safe to skip the binder to access a name
}

pub fn sort_key(&self, tcx: TyCtxt<'a, 'gcx, 'tcx>) -> (u64, InternedString) {
pub fn sort_key(&self, tcx: TyCtxt<'a, 'gcx, 'tcx>) -> (u64, Symbol) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, the intention is to sort the strings themselves.

@@ -0,0 +1,284 @@
// Copyright 2012 The Rust Project Developers. See the COPYRIGHT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year is 2016.

@jseyfried jseyfried force-pushed the symbols branch 2 times, most recently from 520c3c9 to fc519bb Compare November 17, 2016 14:06

impl Decodable for InternedString {
fn decode<D: Decoder>(d: &mut D) -> Result<InternedString, D::Error> {
Ok(Symbol::intern(&d.read_str()?).as_str())
Copy link
Contributor

@petrochenkov petrochenkov Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a suspicious line.
InternedString is supposed to be simply a wrapper around Rc<str> now not tied to symbol interner, right? InternedString::new for example just creates a new Rc and intern_and_get_ident was removed by this PR.
At the same time decode not only creates an Rc but also pushes it into the interner.
One of them is probably wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternedString::new should probably go away.

Copy link
Contributor

@petrochenkov petrochenkov Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what the remaining uses of InternedString are.
Is it used only for hashing/encoding/etc Symbols as strings? In this case something like SymbolStringContent is probably a better solution and InternedString should go away completely, including new.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, InternedString should be semantically equivalent to SymbolStringContent -- we can rename later. I don't think InternedString is that bad of a name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed InternedString::new.

@eddyb
Copy link
Member

eddyb commented Nov 17, 2016

The one other thing I'd like to see is a symbol!("foo") macro that uses a #[thread_local] static Cell<Symbol> to cache the interning of "foo". It should prove at least marginally faster than string comparisons.
But that experiment can be left to another PR, just wanted to jot it down before I forget.

@bors
Copy link
Contributor

bors commented Nov 17, 2016

☔ The latest upstream changes (presumably #37732) made this pull request unmergeable. Please resolve the merge conflicts.

@jseyfried jseyfried force-pushed the symbols branch 7 times, most recently from be6cd9d to fbb1713 Compare November 20, 2016 05:41
@jseyfried
Copy link
Contributor Author

jseyfried commented Nov 20, 2016

@eddyb this should be ready to land now -- the second commit and the last three commits new since last review.

Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with comments addressed.

@@ -574,7 +574,7 @@ pub struct GlobalCtxt<'tcx> {

/// Map from function to the `#[derive]` mode that it's defining. Only used
/// by `proc-macro` crates.
pub derive_macros: RefCell<NodeMap<token::InternedString>>,
pub derive_macros: RefCell<NodeMap<InternedString>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this and crate_name above be Symbol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, done.

@@ -2344,7 +2344,7 @@ impl<'a, 'gcx, 'tcx> TyCtxt<'a, 'gcx, 'tcx> {
if let Some(id) = self.map.as_local_node_id(id) {
self.map.name(id)
} else if id.index == CRATE_DEF_INDEX {
token::intern(&self.sess.cstore.original_crate_name(id.krate))
Symbol::intern(&self.sess.cstore.original_crate_name(id.krate))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like original_crate_name should return a Symbol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -640,23 +612,20 @@ impl RustcDefaultCalls {
let allow_unstable_cfg = UnstableFeatures::from_environment()
.is_nightly_build();

for cfg in &sess.parse_sess.config {
if !allow_unstable_cfg && GatedCfg::gate(cfg).is_some() {
for &(name, ref value) in sess.parse_sess.config.iter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, doesn't this expose HashSet ordering? Even if you were to use a BTreeSet it would expose intern order. Maybe collect strings instead of printing them, to a Vec and just sort that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -895,12 +895,12 @@ impl<'a> CrateLoader<'a> {
n.check_name("name")
}).and_then(|a| a.value_str());
let n = match n {
Some(n) => n,
Some(n) => n.as_str().to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be written as .to_string() directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but no longer relevant after using Symbol instead of String in NativeLibrary.

@@ -321,7 +321,7 @@ impl<'bcx, 'tcx> MirContext<'bcx, 'tcx> {

// Get the location information.
let loc = bcx.sess().codemap().lookup_char_pos(span.lo);
let filename = token::intern_and_get_ident(&loc.file.name);
let filename = Symbol::intern(&loc.file.name).as_str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to change loc.file.name to Symbol - in fact, any String in the compiler that is not on some transient and/or error path, may benefit from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but best for a future PR.

#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Symbol(pub u32);

impl !Send for Symbol { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment about its thread locality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


/// A symbol is an interned or gensymed string.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Symbol(pub u32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pub field is dangerous, you'll have to privatize it before attempting any optimizations (if the fallout is minimal, maybe do it here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// If an interner exists in TLS, return it. Otherwise, prepare a fresh one.
// FIXME(eddyb) #8726 This should probably use a thread-local reference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is no longer relevant (we switched from returning Rc to taking a closure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

/// Reset the ident interner to its initial state.
pub fn reset_interner() {
with_interner(|interner| *interner = Interner::fresh());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsound: you keep references to the string contents around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made reset_interner unsafe. It isn't used in rustc itself, but rusti wants to be able to reset thread local state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a real point to that? The interner would just have a "warm start".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like rusti isn't using driver::reset_thread_local_state at all anymore, so probably not (removed).

/// somehow.
#[derive(Clone, PartialEq, Hash, PartialOrd, Eq, Ord)]
pub struct InternedString {
string: &'static str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventual optimization opportunity: store the CodeMap in the interner, guarantee this "symbol string contents" type always points to something owned by the interner, use it in place of &str in the lexer, with appropriate slicing & whatnot, and allow interning it without allocating again - although this requires the arena-based interner too.

@eddyb
Copy link
Member

eddyb commented Nov 20, 2016

cc @rust-lang/compiler

@jseyfried jseyfried force-pushed the symbols branch 2 times, most recently from 6698d2c to 5400c4f Compare November 21, 2016 00:03
@eddyb
Copy link
Member

eddyb commented Nov 21, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Nov 21, 2016

📌 Commit 5400c4f has been approved by eddyb

@bors
Copy link
Contributor

bors commented Nov 21, 2016

⌛ Testing commit 5400c4f with merge 531bd42...

@jseyfried
Copy link
Contributor Author

@bors r=eddyb

@bors
Copy link
Contributor

bors commented Nov 21, 2016

📌 Commit 60b5372 has been approved by eddyb

@bors
Copy link
Contributor

bors commented Nov 21, 2016

⌛ Testing commit 60b5372 with merge a56532f...

@bors
Copy link
Contributor

bors commented Nov 21, 2016

💔 Test failed - auto-win-msvc-64-opt-rustbuild

@jseyfried
Copy link
Contributor Author

@bors r=eddyb

@bors
Copy link
Contributor

bors commented Nov 21, 2016

📌 Commit a8e86f0 has been approved by eddyb

@bors
Copy link
Contributor

bors commented Nov 21, 2016

⌛ Testing commit a8e86f0 with merge ebec554...

bors added a commit that referenced this pull request Nov 21, 2016
Clean up `ast::Attribute`, `ast::CrateConfig`, and string interning

This PR
 - removes `ast::Attribute_` (changing `Attribute` from `Spanned<Attribute_>` to a struct),
 - moves a `MetaItem`'s name from the `MetaItemKind` variants to a field of `MetaItem`,
 - avoids needlessly wrapping `ast::MetaItem` with `P`,
 - moves string interning into `syntax::symbol` (`ast::Name` is a reexport of `symbol::Symbol` for now),
 - replaces `InternedString` with `Symbol` in the AST, HIR, and various other places, and
 - refactors `ast::CrateConfig` from a `Vec` to a `HashSet`.

r? @eddyb
@bors bors merged commit a8e86f0 into rust-lang:master Nov 21, 2016
@jseyfried jseyfried deleted the symbols branch November 21, 2016 22:05
andersk added a commit to andersk/quickcheck-rs that referenced this pull request Dec 24, 2016
“Clean up `ast::Attribute`, `ast::CrateConfig`, and string interning”

Resolves these errors:

error[E0425]: unresolved name `token::intern`
  --> src/lib.rs:27:35
   |
27 |     reg.register_syntax_extension(token::intern("quickcheck"),
   |                                   ^^^^^^^^^^^^^ unresolved name

error[E0425]: unresolved name `token::str_to_ident`
  --> src/lib.rs:86:23
   |
86 |     let check_ident = token::str_to_ident("quickcheck");
   |                       ^^^^^^^^^^^^^^^^^^^ unresolved name

error[E0425]: unresolved name `token::intern_and_get_ident`
   --> src/lib.rs:107:34
    |
107 |         span, cx.meta_word(span, token::intern_and_get_ident("test"))));
    |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^ unresolved name

error: aborting due to 3 previous errors

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
BurntSushi pushed a commit to BurntSushi/quickcheck that referenced this pull request Dec 27, 2016
“Clean up `ast::Attribute`, `ast::CrateConfig`, and string interning”

Resolves these errors:

error[E0425]: unresolved name `token::intern`
  --> src/lib.rs:27:35
   |
27 |     reg.register_syntax_extension(token::intern("quickcheck"),
   |                                   ^^^^^^^^^^^^^ unresolved name

error[E0425]: unresolved name `token::str_to_ident`
  --> src/lib.rs:86:23
   |
86 |     let check_ident = token::str_to_ident("quickcheck");
   |                       ^^^^^^^^^^^^^^^^^^^ unresolved name

error[E0425]: unresolved name `token::intern_and_get_ident`
   --> src/lib.rs:107:34
    |
107 |         span, cx.meta_word(span, token::intern_and_get_ident("test"))));
    |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^ unresolved name

error: aborting due to 3 previous errors

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants