Skip to content

Add byte, byte string, and raw byte string literals. #14880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 65 additions & 6 deletions src/doc/rust.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ rule. A literal is a form of constant expression, so is evaluated (primarily)
at compile time.

~~~~ {.ebnf .gram}
literal : string_lit | char_lit | num_lit ;
literal : string_lit | char_lit | byte_string_lit | byte_lit | num_lit ;
~~~~

#### Character and string literals
Expand All @@ -244,17 +244,17 @@ char_lit : '\x27' char_body '\x27' ;
string_lit : '"' string_body * '"' | 'r' raw_string ;

char_body : non_single_quote
| '\x5c' [ '\x27' | common_escape ] ;
| '\x5c' [ '\x27' | common_escape | unicode_escape ] ;

string_body : non_double_quote
| '\x5c' [ '\x22' | common_escape ] ;
| '\x5c' [ '\x22' | common_escape | unicode_escape ] ;
raw_string : '"' raw_string_body '"' | '#' raw_string '#' ;

common_escape : '\x5c'
| 'n' | 'r' | 't' | '0'
| 'x' hex_digit 2
| 'u' hex_digit 4
| 'U' hex_digit 8 ;
unicode_escape : 'u' hex_digit 4
| 'U' hex_digit 8 ;

hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
Expand Down Expand Up @@ -294,7 +294,7 @@ the following forms:
escaped in order to denote *itself*.

Raw string literals do not process any escapes. They start with the character
`U+0072` (`r`), followed zero or more of the character `U+0023` (`#`) and a
`U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a
`U+0022` (double-quote) character. The _raw string body_ is not defined in the
EBNF grammar above: it can contain any sequence of Unicode characters and is
terminated only by another `U+0022` (double-quote) character, followed by the
Expand All @@ -319,6 +319,65 @@ r##"foo #"# bar"##; // foo #"# bar
"\\x52"; r"\x52"; // \x52
~~~~

#### Byte and byte string literals

~~~~ {.ebnf .gram}
byte_lit : 'b' '\x27' byte_body '\x27' ;
byte_string_lit : 'b' '"' string_body * '"' | 'b' 'r' raw_byte_string ;

byte_body : ascii_non_single_quote
| '\x5c' [ '\x27' | common_escape ] ;

byte_string_body : ascii_non_double_quote
| '\x5c' [ '\x22' | common_escape ] ;
raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ;

~~~~

A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F` range)
enclosed within two `U+0027` (single-quote) characters,
with the exception of `U+0027` itself,
which must be _escaped_ by a preceding U+005C character (`\`),
or a single _escape_.
It is equivalent to a `u8` unsigned 8-bit integer _number literal_.

A _byte string literal_ is a sequence of ASCII characters and _escapes_
enclosed within two `U+0022` (double-quote) characters,
with the exception of `U+0022` itself,
which must be _escaped_ by a preceding `U+005C` character (`\`),
or a _raw byte string literal_.
It is equivalent to a `&'static [u8]` borrowed vectior unsigned 8-bit integers.

Some additional _escapes_ are available in either byte or non-raw byte string
literals. An escape starts with a `U+005C` (`\`) and continues with one of
the following forms:

* An _byte escape_ escape starts with `U+0078` (`x`) and is
followed by exactly two _hex digits_. It denotes the byte
equal to the provided hex value.
* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072`
(`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF),
`0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively.
* The _backslash escape_ is the character `U+005C` (`\`) which must be
escaped in order to denote its ASCII encoding `0x5C`.

Raw byte string literals do not process any escapes.
They start with the character `U+0072` (`r`),
followed by `U+0062` (`b`),
followed by zero or more of the character `U+0023` (`#`),
and a `U+0022` (double-quote) character.
The _raw string body_ is not defined in the EBNF grammar above:
it can contain any sequence of ASCII characters and is
terminated only by another `U+0022` (double-quote) character, followed by the
same number of `U+0023` (`#`) characters that preceded the opening `U+0022`
(double-quote) character.
A raw byte string literal can not contain any non-ASCII byte.

All characters contained in the raw string body represent their ASCII encoding,
the characters `U+0022` (double-quote) (except when followed by at least as
many `U+0023` (`#`) characters as were used to start the raw string literal) or
`U+005C` (`\`) do not have any special meaning.

#### Number literals

~~~~ {.ebnf .gram}
Expand Down
4 changes: 4 additions & 0 deletions src/libcore/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,8 @@ Section: Comparing strings

// share the implementation of the lang-item vs. non-lang-item
// eq_slice.
/// NOTE: This function is (ab)used in rustc::middle::trans::_match
/// to compare &[u8] byte slices that are not necessarily valid UTF-8.
#[inline]
fn eq_slice_(a: &str, b: &str) -> bool {
#[allow(ctypes)]
Expand All @@ -572,6 +574,8 @@ fn eq_slice_(a: &str, b: &str) -> bool {
}

/// Bytewise slice equality
/// NOTE: This function is (ab)used in rustc::middle::trans::_match
/// to compare &[u8] byte slices that are not necessarily valid UTF-8.
#[cfg(not(test))]
#[lang="str_eq"]
#[inline]
Expand Down
2 changes: 1 addition & 1 deletion src/libregex_macros/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ fn exec<'t>(which: ::regex::native::MatchKind, input: &'t str,
#[allow(unused_variable)]
fn run(&mut self, start: uint, end: uint) -> Vec<Option<uint>> {
let mut matched = false;
let prefix_bytes: &[u8] = &$prefix_bytes;
let prefix_bytes: &[u8] = $prefix_bytes;
let mut clist = &mut Threads::new(self.which);
let mut nlist = &mut Threads::new(self.which);

Expand Down
2 changes: 2 additions & 0 deletions src/librustc/middle/const_eval.rs
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ pub fn lit_to_const(lit: &Lit) -> const_val {
LitBinary(ref data) => {
const_binary(Rc::new(data.iter().map(|x| *x).collect()))
}
LitByte(n) => const_uint(n as u64),
LitChar(n) => const_uint(n as u64),
LitInt(n, _) => const_int(n),
LitUint(n, _) => const_uint(n),
Expand All @@ -528,6 +529,7 @@ pub fn compare_const_vals(a: &const_val, b: &const_val) -> Option<int> {
(&const_float(a), &const_float(b)) => compare_vals(a, b),
(&const_str(ref a), &const_str(ref b)) => compare_vals(a, b),
(&const_bool(a), &const_bool(b)) => compare_vals(a, b),
(&const_binary(ref a), &const_binary(ref b)) => compare_vals(a, b),
_ => None
}
}
Expand Down
1 change: 1 addition & 0 deletions src/librustc/middle/lint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -805,6 +805,7 @@ fn check_type_limits(cx: &Context, e: &ast::Expr) {
} else { t };
let (min, max) = uint_ty_range(uint_type);
let lit_val: u64 = match lit.node {
ast::LitByte(_v) => return, // _v is u8, within range by definition
ast::LitInt(v, _) => v as u64,
ast::LitUint(v, _) => v,
ast::LitIntUnsuffixed(v) => v as u64,
Expand Down
17 changes: 14 additions & 3 deletions src/librustc/middle/trans/_match.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1273,13 +1273,24 @@ fn compare_values<'a>(
val: bool_to_i1(result.bcx, result.val)
}
}
_ => cx.sess().bug("only scalars and strings supported in compare_values"),
_ => cx.sess().bug("only strings supported in compare_values"),
},
ty::ty_rptr(_, mt) => match ty::get(mt.ty).sty {
ty::ty_str => compare_str(cx, lhs, rhs, rhs_t),
_ => cx.sess().bug("only scalars and strings supported in compare_values"),
ty::ty_vec(mt, _) => match ty::get(mt.ty).sty {
ty::ty_uint(ast::TyU8) => {
// NOTE: cast &[u8] to &str and abuse the str_eq lang item,
// which calls memcmp().
let t = ty::mk_str_slice(cx.tcx(), ty::ReStatic, ast::MutImmutable);
let lhs = BitCast(cx, lhs, type_of::type_of(cx.ccx(), t).ptr_to());
let rhs = BitCast(cx, rhs, type_of::type_of(cx.ccx(), t).ptr_to());
compare_str(cx, lhs, rhs, rhs_t)
},
_ => cx.sess().bug("only byte strings supported in compare_values"),
},
_ => cx.sess().bug("on string and byte strings supported in compare_values"),
},
_ => cx.sess().bug("only scalars and strings supported in compare_values"),
_ => cx.sess().bug("only scalars, byte strings, and strings supported in compare_values"),
}
}

Expand Down
1 change: 1 addition & 0 deletions src/librustc/middle/trans/consts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ pub fn const_lit(cx: &CrateContext, e: &ast::Expr, lit: ast::Lit)
-> ValueRef {
let _icx = push_ctxt("trans_lit");
match lit.node {
ast::LitByte(b) => C_integral(Type::uint_from_ty(cx, ast::TyU8), b as u64, false),
ast::LitChar(i) => C_integral(Type::char(cx), i as u64, false),
ast::LitInt(i, t) => C_integral(Type::int_from_ty(cx, t), i as u64, true),
ast::LitUint(u, t) => C_integral(Type::uint_from_ty(cx, t), u, false),
Expand Down
1 change: 1 addition & 0 deletions src/librustc/middle/typeck/check/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1715,6 +1715,7 @@ pub fn check_lit(fcx: &FnCtxt, lit: &ast::Lit) -> ty::t {
ast::LitBinary(..) => {
ty::mk_slice(tcx, ty::ReStatic, ty::mt{ ty: ty::mk_u8(), mutbl: ast::MutImmutable })
}
ast::LitByte(_) => ty::mk_u8(),
ast::LitChar(_) => ty::mk_char(),
ast::LitInt(_, t) => ty::mk_mach_int(t),
ast::LitUint(_, t) => ty::mk_mach_uint(t),
Expand Down
8 changes: 8 additions & 0 deletions src/librustdoc/clean/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1924,6 +1924,14 @@ fn lit_to_str(lit: &ast::Lit) -> String {
match lit.node {
ast::LitStr(ref st, _) => st.get().to_string(),
ast::LitBinary(ref data) => format!("{:?}", data.as_slice()),
ast::LitByte(b) => {
let mut res = String::from_str("b'");
(b as char).escape_default(|c| {
res.push_char(c);
});
res.push_char('\'');
res
},
ast::LitChar(c) => format!("'{}'", c),
ast::LitInt(i, _t) => i.to_str(),
ast::LitUint(u, _t) => u.to_str(),
Expand Down
3 changes: 2 additions & 1 deletion src/librustdoc/html/highlight.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,8 @@ fn doit(sess: &parse::ParseSess, mut lexer: lexer::StringReader,
}

// text literals
t::LIT_CHAR(..) | t::LIT_STR(..) | t::LIT_STR_RAW(..) => "string",
t::LIT_BYTE(..) | t::LIT_BINARY(..) | t::LIT_BINARY_RAW(..) |
t::LIT_CHAR(..) | t::LIT_STR(..) | t::LIT_STR_RAW(..) => "string",

// number literals
t::LIT_INT(..) | t::LIT_UINT(..) | t::LIT_INT_UNSUFFIXED(..) |
Expand Down
1 change: 1 addition & 0 deletions src/libsyntax/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,7 @@ pub type Lit = Spanned<Lit_>;
pub enum Lit_ {
LitStr(InternedString, StrStyle),
LitBinary(Rc<Vec<u8> >),
LitByte(u8),
LitChar(char),
LitInt(i64, IntTy),
LitUint(u64, UintTy),
Expand Down
1 change: 1 addition & 0 deletions src/libsyntax/ext/concat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ pub fn expand_syntax_ext(cx: &mut base::ExtCtxt,
ast::LitBool(b) => {
accumulator.push_str(format!("{}", b).as_slice());
}
ast::LitByte(..) |
ast::LitBinary(..) => {
cx.span_err(e.span, "cannot concatenate a binary literal");
}
Expand Down
6 changes: 6 additions & 0 deletions src/libsyntax/ext/quote.rs
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,12 @@ fn mk_token(cx: &ExtCtxt, sp: Span, tok: &token::Token) -> Gc<ast::Expr> {
vec!(mk_binop(cx, sp, binop)));
}

LIT_BYTE(i) => {
let e_byte = cx.expr_lit(sp, ast::LitByte(i));

return cx.expr_call(sp, mk_token_path(cx, sp, "LIT_BYTE"), vec!(e_byte));
}

LIT_CHAR(i) => {
let e_char = cx.expr_lit(sp, ast::LitChar(i));

Expand Down
Loading