Skip to content

Commit ef3c7af

Browse files
committed
metadata: Bump the metadata encoding version.
We have changed the encoding enough to bump that. Also added some notes about metadata encoding to librbml/lib.rs.
1 parent fe73d38 commit ef3c7af

File tree

2 files changed

+102
-6
lines changed

2 files changed

+102
-6
lines changed

src/librbml/lib.rs

+101-5
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,108 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
//! Really Bad Markup Language (rbml) is a temporary measure until we migrate
12-
//! the rust object metadata to a better serialization format. It is not
13-
//! intended to be used by users.
11+
//! Really Bad Markup Language (rbml) is an internal serialization format of rustc.
12+
//! This is not intended to be used by users.
1413
//!
15-
//! It is loosely based on the Extensible Binary Markup Language (ebml):
16-
//! http://www.matroska.org/technical/specs/rfc/index.html
14+
//! Originally based on the Extensible Binary Markup Language
15+
//! (ebml; http://www.matroska.org/technical/specs/rfc/index.html),
16+
//! it is now a separate format tuned for the rust object metadata.
17+
//!
18+
//! # Encoding
19+
//!
20+
//! RBML document consists of the tag, length and data.
21+
//! The encoded data can contain multiple RBML documents concatenated.
22+
//!
23+
//! **Tags** are a hint for the following data.
24+
//! Tags are a number from 0x000 to 0xfff, where 0xf0 through 0xff is reserved.
25+
//! Tags less than 0xf0 are encoded in one literal byte.
26+
//! Tags greater than 0xff are encoded in two big-endian bytes,
27+
//! where the tag number is ORed with 0xf000. (E.g. tag 0x123 = `f1 23`)
28+
//!
29+
//! **Lengths** encode the length of the following data.
30+
//! It is a variable-length unsigned int, and one of the following forms:
31+
//!
32+
//! - `80` through `fe` for lengths up to 0x7e;
33+
//! - `40 ff` through `7f ff` for lengths up to 0x3fff;
34+
//! - `20 40 00` through `3f ff ff` for lengths up to 0x1fffff;
35+
//! - `10 20 00 00` through `1f ff ff ff` for lengths up to 0xfffffff.
36+
//!
37+
//! The "overlong" form is allowed so that the length can be encoded
38+
//! without the prior knowledge of the encoded data.
39+
//! For example, the length 0 can be represented either by `80`, `40 00`,
40+
//! `20 00 00` or `10 00 00 00`.
41+
//! The encoder tries to minimize the length if possible.
42+
//! Also, some predefined tags listed below are so commonly used that
43+
//! their lengths are omitted ("implicit length").
44+
//!
45+
//! **Data** can be either binary bytes or zero or more nested RBML documents.
46+
//! Nested documents cannot overflow, and should be entirely contained
47+
//! within a parent document.
48+
//!
49+
//! # Predefined Tags
50+
//!
51+
//! Most RBML tags are defined by the application.
52+
//! (For the rust object metadata, see also `rustc::metadata::common`.)
53+
//! RBML itself does define a set of predefined tags however,
54+
//! intended for the auto-serialization implementation.
55+
//!
56+
//! Predefined tags with an implicit length:
57+
//!
58+
//! - `U64` (`00`): 8-byte big endian unsigned integer.
59+
//! - `U32` (`01`): 4-byte big endian unsigned integer.
60+
//! - `U16` (`02`): 2-byte big endian unsigned integer.
61+
//! - `U8` (`03`): 1-byte unsigned integer.
62+
//! Any of `U*` tags can be used to encode primitive unsigned integer types,
63+
//! as long as it is no greater than the actual size.
64+
//! For example, `u8` can only be represented via the `U8` tag.
65+
//!
66+
//! - `I64` (`04`): 8-byte big endian signed integer.
67+
//! - `I32` (`05`): 4-byte big endian signed integer.
68+
//! - `I16` (`06`): 2-byte big endian signed integer.
69+
//! - `I8` (`07`): 1-byte signed integer.
70+
//! Similar to `U*` tags. Always uses two's complement encoding.
71+
//!
72+
//! - `Bool` (`08`): 1-byte boolean value, `00` for false and `01` for true.
73+
//!
74+
//! - `Char` (`09`): 4-byte big endian Unicode scalar value.
75+
//! Surrogate pairs or out-of-bound values are invalid.
76+
//!
77+
//! - `F64` (`0a`): 8-byte big endian unsigned integer representing
78+
//! IEEE 754 binary64 floating-point format.
79+
//! - `F32` (`0b`): 4-byte big endian unsigned integer representing
80+
//! IEEE 754 binary32 floating-point format.
81+
//!
82+
//! - `Sub8` (`0c`): 1-byte unsigned integer for supplementary information.
83+
//! - `Sub32` (`0d`): 4-byte unsigned integer for supplementary information.
84+
//! Those two tags normally occur as the first subdocument of certain tags,
85+
//! namely `Enum`, `Vec` and `Map`, to provide a variant or size information.
86+
//! They can be used interchangably.
87+
//!
88+
//! Predefined tags with an explicit length:
89+
//!
90+
//! - `Str` (`0e`): A UTF-8-encoded string.
91+
//!
92+
//! - `Enum` (`0f`): An enum.
93+
//! The first subdocument should be `Sub*` tags with a variant ID.
94+
//! Subsequent subdocuments, if any, encode variant arguments.
95+
//!
96+
//! - `Vec` (`10`): A vector (sequence).
97+
//! - `VecElt` (`11`): A vector element.
98+
//! The first subdocument should be `Sub*` tags with the number of elements.
99+
//! Subsequent subdocuments should be `VecElt` tag per each element.
100+
//!
101+
//! - `Map` (`12`): A map (associated array).
102+
//! - `MapKey` (`13`): A key part of the map entry.
103+
//! - `MapVal` (`14`): A value part of the map entry.
104+
//! The first subdocument should be `Sub*` tags with the number of entries.
105+
//! Subsequent subdocuments should be an alternating sequence of
106+
//! `MapKey` and `MapVal` tags per each entry.
107+
//!
108+
//! - `Opaque` (`15`): An opaque, custom-format tag.
109+
//! Used to wrap ordinary custom tags or data in the auto-serialized context.
110+
//! Rustc typically uses this to encode type informations.
111+
//!
112+
//! First 0x20 tags are reserved by RBML; custom tags start at 0x20.
17113
18114
#![crate_name = "rbml"]
19115
#![unstable(feature = "rustc_private")]

src/librustc/metadata/encoder.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -1920,7 +1920,7 @@ fn encode_dylib_dependency_formats(rbml_w: &mut Encoder, ecx: &EncodeContext) {
19201920

19211921
// NB: Increment this as you change the metadata encoding version.
19221922
#[allow(non_upper_case_globals)]
1923-
pub const metadata_encoding_version : &'static [u8] = &[b'r', b'u', b's', b't', 0, 0, 0, 1 ];
1923+
pub const metadata_encoding_version : &'static [u8] = &[b'r', b'u', b's', b't', 0, 0, 0, 2 ];
19241924

19251925
pub fn encode_metadata(parms: EncodeParams, krate: &ast::Crate) -> Vec<u8> {
19261926
let mut wr = SeekableMemWriter::new();

0 commit comments

Comments
 (0)