No way to (de)serialize a String from binary data? #187
-
I've found myself in a situation where the data I'm reading is giving strings of upto N size. In my example I'll use 16. #[derive(BinRead, PartialEq, Debug)]
#[br(little)]
struct BinaryResourceData {
#[br(count = 16)]
reference: Vec<u8>,
type_id: u16,
id: u32,
}
impl BinaryResourceData {
pub fn name(&self) -> String {
std::str::from_utf8(&self.reference)
.expect("Enable to read the data from the reference.")
.trim_matches('\x00')
.to_owned()
}
} It seems a little unusual to me that there's no native way to provide |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
If you have a fixed-length array containing string data you know is ASCII or UTF-8 then you can do basically what you are now, or you can do something like It is also possible to use the |
Beta Was this translation helpful? Give feedback.
-
Thank you for the possible answers. I'm sure most of my inability here is my very lacking Rust knowledge to date. With some help we came up with the following:- #[derive(NamedArgs, Clone)]
struct PaddedStringArgs {
count: usize,
}
#[binrw::parser(reader)]
fn padded_string_parser(args: PaddedStringArgs, ...) -> binrw::BinResult<String> {
let mut bytes = vec![];
bytes.reserve_exact(args.count);
let bytes_read = reader.take(args.count as u64).read_to_end(&mut bytes)?;
// if bytes_read != args.count {
// return Err(());
// }
// TODO: handle gracefully
let slice = std::str::from_utf8(&bytes).unwrap();
Ok(slice.trim_end_matches('\0').to_owned())
}
#[derive(BinRead, PartialEq, Debug)]
#[br(little)]
struct BinaryResourceData {
#[br(count=16, parse_with = padded_string_parser)]
reference: String,
type_id: u16,
id: u32,
} The reason this is preferred is that it takes a fixed sized of binary data (16), and then trims the null padding as is my needs. I initially opened this issue thinking I may have missed some trickery with the macro magic but I see the issue is much much wider than anticipated. |
Beta Was this translation helpful? Give feedback.
-
Posting this compact implementation of read/write of UTF8 strings headed with a u32 size for reference because I was asked about the topic use std::io::Cursor;
use binrw::*; // binrw = "0.11"
#[binrw]
struct SizedUTF8 {
#[br( temp )]
#[bw( calc = content.as_bytes().len() as u32 )]
embedded_size: u32,
#[br( count = embedded_size, try_map = |data: Vec<u8>| std::str::from_utf8(&data).map(|s| s.to_string()) )]
#[bw( map = |s| s.as_bytes().to_vec() )]
content: String
}
// read
let mut cursor = Cursor::new(b"\x00\x00\x00\x05hello".to_vec());
let blah = SizedUTF8::read_be(&mut cursor).unwrap();
assert_eq!(&blah.content, "hello");
// write
let mut cursor = Cursor::new(Vec::new());
let blah = SizedUTF8 { content: "goodbye".to_string() };
blah.write_be(&mut cursor).unwrap();
assert_eq!(&cursor.into_inner(), b"\x00\x00\x00\x07goodbye"); You can modify this to change the size type or make the struct take an arg, just to post an example that doesn't use a custom parser/look too scary. |
Beta Was this translation helpful? Give feedback.
String
has no default implementation because there is no obvious canonical representation of string data like there is for most other primitive types. Strings may be null-terminated, dollar-sign-terminated, length-prefixed (Pascal), fixed-length space-padded, fixed-length null-padded, length-prefixed and null-terminated, delimited with quotes, in a big block with a separate lookup table, etc. The encoding could be UTF-8, WTF-8, UTF-16, Win-1252, MacRoman, ISO-8859, Shift-JIS, EBCDIC, etc.If you have a fixed-length array containing string data you know is ASCII or UTF-8 then you can do basically what you are now, or you can do something like
#[br(try_map = |data: [u8; 16]| str::from_utf8(…