Add syntax extensions for lowercasing/uppercasing strings (Fixes #16607) #16636

Manishearth · 2014-08-20T21:15:22Z

Testcase:

 #![feature(macro_rules)]
macro_rules! identify (
  ($i: ident) => {
    let $i = to_lower!(stringify!($i));
  }
)

fn main() {
    println!("{}", to_lower!("FooBar"))
    println!("{}", to_lower!(stringify!(FooBar)))
    println!("{}", to_upper!("FooBar"))
    println!("{}", to_upper!(stringify!(FooBar)))
    identify!(FooBar);
    println!("{}",FooBar)
}

lilyball · 2014-08-20T21:37:24Z

#16607 calls them to_ascii_lower!() and to_ascii_upper!(). I'm also inclined to say that if uppercasing is useful, titlecasing is also useful (e.g. where the first letter is uppercased and the rest is left alone).

lilyball · 2014-08-20T21:40:31Z

src/libsyntax/ext/source_util.rs

+        Some(e) => e,
+        None => return base::DummyResult::expr(sp)
+    };
+    match es.move_iter().next() {


If multiple expressions are provided, this will happily process the first and ignore the rest. It should expressly test how many expressions it got.

Would a test for iter.next().is_some() be okay style-wise?

SimonSapin · 2014-08-20T21:47:20Z

char::to_lowercase does some flavor of Unicode case folding. It includes some fun cases like mapping the Kelvin sign K to ASCII lower case k.
str::to_ascii_lower only maps ASCII letters and leaves everything else alone. Although it often looks wrong on arbitrary input, it can make sense when used on a closed set of strings that is entirely ASCII like CSS property names.

The PR currently uses char::to_lowercase, so the to_lower! macro name is consistent.

lilyball · 2014-08-20T21:48:13Z

src/libsyntax/ext/source_util.rs

+            }
+        },
+        _ => ()
+    }


This is hard to read, with all the rightward drift. Ideally you could compress this into a single match, but I don't think you can match the Gc in ExprLit. We can at least get rid of one of the match levels, and remove the return from the inside of this big beast:

let s = match es.move_iter().next() { Some(codemap::Spanned { node: ast::ExprLit(lit), .. }) => match lit.node { ast::LitStr(ref s, _) => Some(s), _ => None }, _ => None }; match s { Some(s) => { let new_s = s.get().chars().map(transform).collect::<String>(); base::MacExpr::new(cx.expr_str(sp, token::intern_and_get_ident(new_s.as_slice()))) } None => { cx.span_err(sp, "expected a string literal"); base::DummyResult::expr(sp) } }

You should also probably pull the span off of the first expression, instead of using the span for the whole macro.

Yeah, the Gc was giving problems. I couldn't find any way of destructuring the @ (I was surprised that syntax still exists)

This seems like a better option, thanks :)

I tried this, there's another Gc on Expr, so I had to use the following (one extra match):

let s = match iter.next() { Some(expr) => { match *expr { ast::Expr { node: ast::ExprLit(lit), span: sp2 , ..} => { sp = sp2; match lit.node { ast::LitStr(ref s, _) => Some(s), _ => None } }, _ => None } }, _ => None };

node doesn't live long enough so I'll have to use s.clone() if I don't want to nest the matches further. Should I? Seems a bit unnecessary.

It occurs to me the ref s isn't going to work, since it's a reference to a moved value. We can skip the move, though, and we can also produce a better error for providing the wrong argument type vs not providing the right number of arguments. Something like:

let mut it = es.iter(); let res = match it.next() { Some(&codemap::Spanned { node: ast::ExprLit(ref lit), .. }) => match lit.node { ast::LitStr(ref s, span) => Some((s, span)), _ => { cx.span_err(span, "expected a string literal"); None } }, _ => { cx.span_err(sp, "expected 1 argument, found 0"); None } }; match (res,it.count()) { (Some((s,span)),0) => { let new_s = s.get().chars().map(transform).collect::<String>(); base::MacExpr::new(cx.expr_str(sp, token::intern_and_get_ident(new_s.as_slice()))) } (_, rest) => { if rest > 0 { cx.span_err(sp, format!("expected 1 argument, found {}", rest+1).as_slice()); } base::DummyResult::expr(sp) } }

Why is there a codemap::Spanned there? Shouldn't it be an ast::Ext?

Because I was assuming that Expr was a Spanned<Expr_>. It actually isn't, which is surprising, and is apparently due to the need for a separate id field. Anyway, I'm guessing the following should work:

fn expand_cased(cx: &mut ExtCtxt, sp: Span, tts: &[ast::TokenTree], transform: |char| -> char) -> Box<base::MacResult> { let es = match base::get_exprs_from_tts(cx, sp, tts) { Some(e) => e, None => return base::DummyResult::expr(sp) }; let mut it = es.iter(); let res = match it.next() { // FIXME (#13910): nested matches are necessary to get through Gc<> Some(expr) => match expr.node { ast::ExprLit(ref lit) => match lit.node { ast::LitStr(ref s, span) => Some((s, span)), _ => { cx.span_err(span, "expected a string literal"); None } } _ => { cx.span_err(expr.span, "expected a string literal"); None } }, None => { cx.span_err(sp, "expected 1 argument, found 0"); None } }; match (res,it.count()) { (Some((s,span)),0) => { let new_s = s.get().chars().map(transform).collect::<String>(); base::MacExpr::new(cx.expr_str(span, token::intern_and_get_ident(new_s.as_slice()))) } (_, rest) => { if rest > 0 { cx.span_err(sp, format!("expected 1 argument, found {}", rest+1).as_slice()); } base::DummyResult::expr(sp) } } }

Ok, thanks. Should the sp in the final MacExpr be sp or span? It might end up with the wrong span on expansion -- I'm not entirely clear on how ExtCtxt works to be sure of this, and I'd rather not wait for two full builds to see ;)

Good catch, I meant to make that span. I'll edit my comment.

There were more issues with span being taken from the wrong place (should
be lit.span, not the second value of ExprLit), but I fixed that.

Thanks for the help!

-Manish Goregaokar

On Thu, Aug 21, 2014 at 4:59 AM, Kevin Ballard notifications@github.com
wrote:

In src/libsyntax/ext/source_util.rs:

};

match es.move_iter().next() {

Some(expr) => {

match expr.node {

ast::ExprLit(lit) => match lit.node {

ast::LitStr(ref s, _) => {

return base::MacExpr::new(cx.expr_str(sp,

token::intern_and_get_ident(s.get().chars().map(transform).collect::<String>().as_slice())))

},

_ => ()

},

_ => ()

}

},

_ => ()

}

Good catch, I meant to make that span. I'll edit my comment.

—
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/pull/16636/files#r16512839.

lilyball · 2014-08-20T21:49:40Z

@SimonSapin Ok, fair enough.

SimonSapin · 2014-08-20T21:52:18Z

From chatting on IRC with @Manishearth: either kind of lower casing works for Servo’s use case since they behave the same when the input is within ASCII, which we know because it’s compile time (no arbitrary input from web content).

Manishearth · 2014-08-20T21:56:24Z

@SimonSapin I can add to_ascii_* variants if we want, since all I have to do is add variants of the transform closure.

With some changes (accept a closure that transforms on the whole string isntead of individual chars) I can add titlecasing (and snakecasing) support as well, if these would be useful.

lilyball · 2014-08-20T22:54:45Z

@Manishearth I don't think there's a need for the ascii variants. I only brought it up because the original issue talked about to_ascii_lower!() but if the motivating use-case doesn't care, then it's better to have fewer macros.

…-lang#16607)

Manishearth · 2014-08-20T23:32:50Z

Updated with the style changes. I'll look into adding to_titlecase! tomorrow.

alexcrichton · 2014-08-21T16:21:24Z

This is an addition to the prelude for all rust programs in existence, and should be considered with the gravity as such. Currently we require any modifications of the prelude (additions or removals) such as this to go through the RFC process.

This is also why we have created a plugin infrastructure for developing extensions such as this out of tree.

Manishearth · 2014-08-21T16:25:53Z

Alright, I'll draft up an RFC when I get time.

-Manish Goregaokar

On Thu, Aug 21, 2014 at 9:51 PM, Alex Crichton notifications@github.com
wrote:

This is an addition to the prelude for all rust programs in existence, and
should be considered with the gravity as such. Currently we require any
modifications of the prelude (additions or removals) such as this to go
through the RFC process.

This is also why we have created a plugin infrastructure for developing
extensions such as this out of tree.

—
Reply to this email directly or view it on GitHub
#16636 (comment).

SimonSapin · 2014-08-21T16:27:21Z

Alternatively, Servo could define these syntax extensions itself in its macros crate.

Manishearth · 2014-08-21T16:28:54Z

That would work. I'll have a go.

-Manish Goregaokar

On Thu, Aug 21, 2014 at 9:57 PM, Simon Sapin notifications@github.com
wrote:

Alternatively, Servo could define these syntax extensions itself in its
macros crate.

—
Reply to this email directly or view it on GitHub
#16636 (comment).

alexcrichton · 2014-08-29T16:52:50Z

It sounds like this may be able to live outside the compiler in servo, so I'm going to close this for now. Feel free to reopen if any surprises arise!

@Manishearth