|
8 | 8 | // option. This file may not be copied, modified, or distributed
|
9 | 9 | // except according to those terms.
|
10 | 10 |
|
11 |
| -//! Really Bad Markup Language (rbml) is a temporary measure until we migrate |
12 |
| -//! the rust object metadata to a better serialization format. It is not |
13 |
| -//! intended to be used by users. |
| 11 | +//! Really Bad Markup Language (rbml) is an internal serialization format of rustc. |
| 12 | +//! This is not intended to be used by users. |
14 | 13 | //!
|
15 |
| -//! It is loosely based on the Extensible Binary Markup Language (ebml): |
16 |
| -//! http://www.matroska.org/technical/specs/rfc/index.html |
| 14 | +//! Originally based on the Extensible Binary Markup Language |
| 15 | +//! (ebml; http://www.matroska.org/technical/specs/rfc/index.html), |
| 16 | +//! it is now a separate format tuned for the rust object metadata. |
| 17 | +//! |
| 18 | +//! # Encoding |
| 19 | +//! |
| 20 | +//! RBML document consists of the tag, length and data. |
| 21 | +//! The encoded data can contain multiple RBML documents concatenated. |
| 22 | +//! |
| 23 | +//! **Tags** are a hint for the following data. |
| 24 | +//! Tags are a number from 0x000 to 0xfff, where 0xf0 through 0xff is reserved. |
| 25 | +//! Tags less than 0xf0 are encoded in one literal byte. |
| 26 | +//! Tags greater than 0xff are encoded in two big-endian bytes, |
| 27 | +//! where the tag number is ORed with 0xf000. (E.g. tag 0x123 = `f1 23`) |
| 28 | +//! |
| 29 | +//! **Lengths** encode the length of the following data. |
| 30 | +//! It is a variable-length unsigned int, and one of the following forms: |
| 31 | +//! |
| 32 | +//! - `80` through `fe` for lengths up to 0x7e; |
| 33 | +//! - `40 ff` through `7f ff` for lengths up to 0x3fff; |
| 34 | +//! - `20 40 00` through `3f ff ff` for lengths up to 0x1fffff; |
| 35 | +//! - `10 20 00 00` through `1f ff ff ff` for lengths up to 0xfffffff. |
| 36 | +//! |
| 37 | +//! The "overlong" form is allowed so that the length can be encoded |
| 38 | +//! without the prior knowledge of the encoded data. |
| 39 | +//! For example, the length 0 can be represented either by `80`, `40 00`, |
| 40 | +//! `20 00 00` or `10 00 00 00`. |
| 41 | +//! The encoder tries to minimize the length if possible. |
| 42 | +//! Also, some predefined tags listed below are so commonly used that |
| 43 | +//! their lengths are omitted ("implicit length"). |
| 44 | +//! |
| 45 | +//! **Data** can be either binary bytes or zero or more nested RBML documents. |
| 46 | +//! Nested documents cannot overflow, and should be entirely contained |
| 47 | +//! within a parent document. |
| 48 | +//! |
| 49 | +//! # Predefined Tags |
| 50 | +//! |
| 51 | +//! Most RBML tags are defined by the application. |
| 52 | +//! (For the rust object metadata, see also `rustc::metadata::common`.) |
| 53 | +//! RBML itself does define a set of predefined tags however, |
| 54 | +//! intended for the auto-serialization implementation. |
| 55 | +//! |
| 56 | +//! Predefined tags with an implicit length: |
| 57 | +//! |
| 58 | +//! - `U64` (`00`): 8-byte big endian unsigned integer. |
| 59 | +//! - `U32` (`01`): 4-byte big endian unsigned integer. |
| 60 | +//! - `U16` (`02`): 2-byte big endian unsigned integer. |
| 61 | +//! - `U8` (`03`): 1-byte unsigned integer. |
| 62 | +//! Any of `U*` tags can be used to encode primitive unsigned integer types, |
| 63 | +//! as long as it is no greater than the actual size. |
| 64 | +//! For example, `u8` can only be represented via the `U8` tag. |
| 65 | +//! |
| 66 | +//! - `I64` (`04`): 8-byte big endian signed integer. |
| 67 | +//! - `I32` (`05`): 4-byte big endian signed integer. |
| 68 | +//! - `I16` (`06`): 2-byte big endian signed integer. |
| 69 | +//! - `I8` (`07`): 1-byte signed integer. |
| 70 | +//! Similar to `U*` tags. Always uses two's complement encoding. |
| 71 | +//! |
| 72 | +//! - `Bool` (`08`): 1-byte boolean value, `00` for false and `01` for true. |
| 73 | +//! |
| 74 | +//! - `Char` (`09`): 4-byte big endian Unicode scalar value. |
| 75 | +//! Surrogate pairs or out-of-bound values are invalid. |
| 76 | +//! |
| 77 | +//! - `F64` (`0a`): 8-byte big endian unsigned integer representing |
| 78 | +//! IEEE 754 binary64 floating-point format. |
| 79 | +//! - `F32` (`0b`): 4-byte big endian unsigned integer representing |
| 80 | +//! IEEE 754 binary32 floating-point format. |
| 81 | +//! |
| 82 | +//! - `Sub8` (`0c`): 1-byte unsigned integer for supplementary information. |
| 83 | +//! - `Sub32` (`0d`): 4-byte unsigned integer for supplementary information. |
| 84 | +//! Those two tags normally occur as the first subdocument of certain tags, |
| 85 | +//! namely `Enum`, `Vec` and `Map`, to provide a variant or size information. |
| 86 | +//! They can be used interchangably. |
| 87 | +//! |
| 88 | +//! Predefined tags with an explicit length: |
| 89 | +//! |
| 90 | +//! - `Str` (`0e`): A UTF-8-encoded string. |
| 91 | +//! |
| 92 | +//! - `Enum` (`0f`): An enum. |
| 93 | +//! The first subdocument should be `Sub*` tags with a variant ID. |
| 94 | +//! Subsequent subdocuments, if any, encode variant arguments. |
| 95 | +//! |
| 96 | +//! - `Vec` (`10`): A vector (sequence). |
| 97 | +//! - `VecElt` (`11`): A vector element. |
| 98 | +//! The first subdocument should be `Sub*` tags with the number of elements. |
| 99 | +//! Subsequent subdocuments should be `VecElt` tag per each element. |
| 100 | +//! |
| 101 | +//! - `Map` (`12`): A map (associated array). |
| 102 | +//! - `MapKey` (`13`): A key part of the map entry. |
| 103 | +//! - `MapVal` (`14`): A value part of the map entry. |
| 104 | +//! The first subdocument should be `Sub*` tags with the number of entries. |
| 105 | +//! Subsequent subdocuments should be an alternating sequence of |
| 106 | +//! `MapKey` and `MapVal` tags per each entry. |
| 107 | +//! |
| 108 | +//! - `Opaque` (`15`): An opaque, custom-format tag. |
| 109 | +//! Used to wrap ordinary custom tags or data in the auto-serialized context. |
| 110 | +//! Rustc typically uses this to encode type informations. |
| 111 | +//! |
| 112 | +//! First 0x20 tags are reserved by RBML; custom tags start at 0x20. |
17 | 113 |
|
18 | 114 | #![crate_name = "rbml"]
|
19 | 115 | #![unstable(feature = "rustc_private")]
|
|
0 commit comments