Replies: 5 comments 10 replies
-
Due to mean the JSON stream could be cut off at some arbitrary position? In that case Struson should return However, there are probably a few issues with this:
A problem with your point (3.) is also that I assume you could probably implement something like you proposed by writing a custom Just to clarify though; at the moment I don't think that I will be directly providing functionality for reading incomplete JSON values other than what is already supported2 as part of Struson, instead of asking users implement their own Also, in which way is the end of the stream visible? Does
I am not actually that familiar with all the special features of Serde which might help here. It might be good to also ask this for example on Stack Overflow explicitly for Serde (and ideally link the question here); maybe someone has a good idea for that. Hopefully that answered your questions; if not feel free to also give a short JSON or code example to demonstrate your use case. Let me know what you think. And thanks also for considering Struson! Footnotes |
Beta Was this translation helpful? Give feedback.
-
Actually, ignore the issues I mentioned in my previous comment. This is probably possible, but the So assuming your JSON is simply cut off at some point (i.e. Proof-of-concept (click to expand)Note: I haven't tested this extensively yet and it is probably not the most efficient implementation, but it should be functional. #[derive(Debug, PartialEq)]
enum PeekedValue {
Null,
Bool(bool),
Number(String),
String(String),
/// Peeked array start, but has not been consumed yet
PeekedArray,
/// Peeked object start, but has not been consumed yet
PeekedObject,
}
impl PeekedValue {
fn get_value_type(&self) -> ValueType {
match self {
PeekedValue::Null => ValueType::Null,
PeekedValue::Bool(_) => ValueType::Boolean,
PeekedValue::Number(_) => ValueType::Number,
PeekedValue::String(_) => ValueType::String,
PeekedValue::PeekedArray => ValueType::Array,
PeekedValue::PeekedObject => ValueType::Object,
}
}
}
struct PartialJsonReader<J: JsonReader> {
delegate: J,
reached_eof: bool,
/// Stack which is expanded every time an array or object is opened;
/// values are `true` if object, `false` if array
is_in_object: Vec<bool>,
/// Temporarily holding string value or name to allow returning reference to it
string_buf: String,
peeked_name: Option<String>,
peeked_value: Option<PeekedValue>,
}
impl<J: JsonReader> PartialJsonReader<J> {
// TODO: Could probably provide real location in some cases, but for simplicty return unknown
// location all the time
fn get_unknown_location(&self) -> JsonErrorLocation {
JsonErrorLocation {
path: "?".to_owned(),
line: 0,
column: 0,
}
}
fn peek_value(&mut self) -> Result<ValueType, ReaderError> {
let peeked = self.delegate.peek()?;
self.peeked_value = Some(match peeked {
ValueType::Array => PeekedValue::PeekedArray,
ValueType::Object => PeekedValue::PeekedObject,
ValueType::String => PeekedValue::String(self.delegate.next_string()?),
ValueType::Number => {
let v = PeekedValue::Number(self.delegate.next_number_as_string()?);
// For number must make sure complete number was processed; for example
// `1` might actually be `1.2` or `12`
// Only works for non-top-level value; for top-level value cannot know if number is complete
if !self.is_in_object.is_empty() {
// Trigger EOF error in case nothing follows last char of number
self.delegate.has_next()?;
}
v
}
ValueType::Boolean => PeekedValue::Bool(self.delegate.next_bool()?),
ValueType::Null => {
self.delegate.next_null()?;
PeekedValue::Null
}
});
Ok(peeked)
}
fn has_next_impl(&mut self) -> Result<bool, ReaderError> {
if self.delegate.has_next()? {
// Must peek next array item / object member
if let Some(true) = self.is_in_object.last() {
self.peeked_name = Some(self.delegate.next_name_owned()?);
}
self.peek_value()?;
Ok(true)
} else {
Ok(false)
}
}
}
impl<J: JsonReader> JsonReader for PartialJsonReader<J> {
fn peek(&mut self) -> Result<ValueType, ReaderError> {
// If called for top-level value and value has not peeked yet, peek at it here
if self.is_in_object.is_empty() && self.peeked_value.is_none() {
return self.peek_value();
}
if let Some(p) = &self.peeked_value {
Ok(p.get_value_type())
} else {
panic!("should call `has_next` before peeking value")
}
}
fn begin_object(&mut self) -> Result<(), ReaderError> {
if let Some(p) = self.peeked_value.take() {
if p == PeekedValue::PeekedObject {
self.is_in_object.push(true);
self.delegate.begin_object()
} else {
Err(ReaderError::UnexpectedValueType {
expected: ValueType::Object,
actual: p.get_value_type(),
location: self.get_unknown_location(),
})
}
} else {
panic!("should call `has_next` before consuming value")
}
}
fn end_object(&mut self) -> Result<(), ReaderError> {
self.is_in_object.pop();
if self.reached_eof {
Ok(())
} else {
self.delegate.end_object()
}
}
fn begin_array(&mut self) -> Result<(), ReaderError> {
if let Some(p) = self.peeked_value.take() {
if p == PeekedValue::PeekedArray {
self.is_in_object.push(false);
self.delegate.begin_array()
} else {
Err(ReaderError::UnexpectedValueType {
expected: ValueType::Array,
actual: p.get_value_type(),
location: self.get_unknown_location(),
})
}
} else {
panic!("should call `has_next` before consuming value")
}
}
fn end_array(&mut self) -> Result<(), ReaderError> {
self.is_in_object.pop();
if self.reached_eof {
Ok(())
} else {
self.delegate.end_array()
}
}
fn has_next(&mut self) -> Result<bool, ReaderError> {
if self.reached_eof {
Ok(false)
} else if self.peeked_name.is_some() || self.peeked_value.is_some() {
Ok(true)
} else {
match self.has_next_impl() {
// JsonStreamReader currently reports not only `SyntaxErrorKind::IncompleteDocument`
// on unexpected EOF, but also other errors, such as `InvalidLiteral`
Err(ReaderError::SyntaxError(JsonSyntaxError { .. })) => {
self.reached_eof = true;
Ok(false)
}
// Propagate any other errors, or success result
r => r,
}
}
}
fn next_name(&mut self) -> Result<&'_ str, ReaderError> {
self.string_buf = self.next_name_owned()?;
Ok(&self.string_buf)
}
fn next_name_owned(&mut self) -> Result<String, ReaderError> {
if let Some(s) = self.peeked_name.take() {
Ok(s)
} else {
panic!("should call `has_next` before consuming name")
}
}
fn next_str(&mut self) -> Result<&'_ str, ReaderError> {
self.string_buf = self.next_string()?;
Ok(&self.string_buf)
}
fn next_string(&mut self) -> Result<String, ReaderError> {
if let Some(p) = self.peeked_value.take() {
if let PeekedValue::String(s) = p {
Ok(s)
} else {
Err(ReaderError::UnexpectedValueType {
expected: ValueType::String,
actual: p.get_value_type(),
location: self.get_unknown_location(),
})
}
} else {
panic!("should call `has_next` before consuming value")
}
}
fn next_string_reader(&mut self) -> Result<Box<dyn std::io::Read + '_>, ReaderError> {
unimplemented!()
}
fn next_number_as_str(&mut self) -> Result<&'_ str, ReaderError> {
self.string_buf = self.next_number_as_string()?;
Ok(&self.string_buf)
}
fn next_number_as_string(&mut self) -> Result<String, ReaderError> {
if let Some(p) = self.peeked_value.take() {
if let PeekedValue::Number(s) = p {
Ok(s)
} else {
Err(ReaderError::UnexpectedValueType {
expected: ValueType::Number,
actual: p.get_value_type(),
location: self.get_unknown_location(),
})
}
} else {
panic!("should call `has_next` before consuming value")
}
}
fn next_bool(&mut self) -> Result<bool, ReaderError> {
if let Some(p) = self.peeked_value.take() {
if let PeekedValue::Bool(b) = p {
Ok(b)
} else {
Err(ReaderError::UnexpectedValueType {
expected: ValueType::Boolean,
actual: p.get_value_type(),
location: self.get_unknown_location(),
})
}
} else {
panic!("should call `has_next` before consuming value")
}
}
fn next_null(&mut self) -> Result<(), ReaderError> {
if let Some(p) = self.peeked_value.take() {
if p == PeekedValue::Null {
Ok(())
} else {
Err(ReaderError::UnexpectedValueType {
expected: ValueType::Null,
actual: p.get_value_type(),
location: self.get_unknown_location(),
})
}
} else {
panic!("should call `has_next` before consuming value")
}
}
fn deserialize_next<'de, D: Deserialize<'de>>(&mut self) -> Result<D, DeserializerError> {
let mut deserializer = JsonReaderDeserializer::new(self);
D::deserialize(&mut deserializer)
}
fn skip_name(&mut self) -> Result<(), ReaderError> {
if self.peeked_name.take().is_some() {
Ok(())
} else {
panic!("should call `has_next` before consuming name")
}
}
// Important: This is implemented recursively; could lead to stack overflow for deeply nested JSON
fn skip_value(&mut self) -> Result<(), ReaderError> {
if let Some(p) = &self.peeked_value {
// For array and object need to manually skip value here by delegating to other
// methods to handle EOF properly; cannot delegate to underlying JSON reader
if *p == PeekedValue::PeekedArray {
self.begin_array()?;
while self.has_next()? {
self.skip_value()?;
}
self.end_array()
} else if *p == PeekedValue::PeekedObject {
self.begin_object()?;
while self.has_next()? {
self.skip_name()?;
self.skip_value()?;
}
self.end_object()
} else {
self.peeked_value.take();
Ok(())
}
} else {
panic!("should call `has_next` before skipping value")
}
}
fn seek_to(&mut self, rel_json_path: &json_path::JsonPath) -> Result<(), ReaderError> {
unimplemented!()
}
fn skip_to_top_level(&mut self) -> Result<(), ReaderError> {
unimplemented!()
}
fn transfer_to<W: JsonWriter>(&mut self, json_writer: &mut W) -> Result<(), TransferError> {
unimplemented!()
}
fn consume_trailing_whitespace(self) -> Result<(), ReaderError> {
if self.reached_eof {
Ok(())
} else {
self.delegate.consume_trailing_whitespace()
}
}
}
macro_rules! deserialize_partial {
($reader:expr, |$deserializer:ident| $deserializing_function:expr) => {{
let mut json_reader = PartialJsonReader {
delegate: JsonStreamReader::new($reader),
reached_eof: false,
is_in_object: Vec::new(),
string_buf: String::new(),
peeked_name: None,
peeked_value: None,
};
let mut d = JsonReaderDeserializer::new(&mut json_reader);
let $deserializer = &mut d;
$deserializing_function
}};
} And then you can use it like this: #[derive(Debug, Default, Deserialize)]
#[serde(default)]
struct Outer {
a: u32,
b: bool,
c: Option<u32>,
d: Vec<Inner>,
}
#[derive(Debug, Default, Deserialize)]
#[serde(default)]
struct Inner {
e: String,
f: f32,
}
let full_json = r#"{"a":2,"b":true,"c":null,"d":[{"e":"str\"","f":1.2e3}]}"#;
let mut json = String::new();
let mut outer = Outer::default();
for c in full_json.chars() {
json.push(c);
deserialize_partial!(json.as_bytes(), |d| Outer::deserialize_in_place(
d, &mut outer
))
.unwrap();
println!("{json}\n {outer:?}\n");
} Is this what you had in mind? |
Beta Was this translation helpful? Give feedback.
-
Oh, nice! Yes, I was a bit concerned with the problems you mentioned and thought that some kind of buffering of already read data should be enough to solve those. Looking at the code and your test case this is pretty much what I needed. :-) One thing I need to add is some kind of tracking for successfully parsed top-level values. My actual use-case is processing websocket where I want to start analyzing data even before the |
Beta Was this translation helpful? Give feedback.
-
I played with the above code and it works well for me. However I think that recovery from the partial parse is going to be tricky:
|
Beta Was this translation helpful? Give feedback.
-
One potentially useful idea here: maybe
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I was wondering about the following use-case
serde
-compatible structWhen looking at
struson
I think this capability is almost there, maybe except for (3.), which seems much harder in general. I guess the simplest way this could currently be done is:#serde(default)
such thatserde
would accept whatever partial content is thereThe last bullet is rather a hack, I suppose this could be handled in a more proper way inside the deserializer itself, possibly even opening a path towards (3.). Also, maybe this would allow better support for fields/sub-objects that need to be parsed as a whole and do not allow defaulting their parts.
What is your opinion about this kind of use-cases and whether attempting to still use
serde
for parsing partial content is worth the effort (the alternative I'm already considering is to use reader directly to manually populate the struct, thoughdeserialize_in_place
might be still a good way to integrate both worlds)?Beta Was this translation helpful? Give feedback.
All reactions