-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message parser should be able to support arbitrary whitespace such as '\n', '\t', '\r', and ' ' within and between messages #35
Comments
Hi @addievo, There is a Would that work for you? |
Hi @juanjoDiaz, Thank you for your reply. While the separator option does exist, it appears limited to recognizing only one type of separator between JSON messages. In our use-case, we need the parser to accommodate a range of whitespace characters such as '\n', '\t', '\r', and ' ' as valid separators between messages. Is it possible to extend the functionality to support multiple types of separators? |
@juanjoDiaz is it possible to make |
I think that regex is too flexible. I'm thinking of allowing an array of separators. |
I implemented this but the performance impact is too big for me to accept it. So I need to think of a way to keep the current performance for people that is fine working with a single separator while allowing multiple separators for those that can accept the performance impact. |
Wouldn't the performance only be impacted if there were actual whitespace between the json strings? And also if I give an array of 1 character wouldn't that be the same as just specifying a single character? And when you say we can chain whitespace, do you mean that if there is more than 1 whitespace character in the stream, it will auto-drop them all? |
Hey @juanjoDiaz, What about starting off with a quick check to see how many types separators are in the array? If it's just one, we could stick to the fast route we already have. If there's more, then we switch to the slower, but more flexible, multi-separator logic. This way, we get the best of both worlds without a big hit on performance. What do you think? |
Just to clarify the input stream may have 0 or more whitespace characters. The idea is for the stream parser to completely remove them all regardless of how many there is. So we cannot fix the number of expected whitespace separators like |
During tokenization, we need to check for every character to see if it is a separator o no. Considering that you might be streaming a million JSON object separated by a simple character, the overhead of checkign against the list piles up and becomes very noticeable.
Yes. If your separator is The problem is with matching
Yes. This is what I want to try next. |
Hey @juanjoDiaz, Thanks a bunch for your reply and assistance. |
Hi @juanjoDiaz , |
Hi @KWLandry-acoustic , You issue is not related. I just realized that I'm an idiot and this feature was supported since the very beginning 🤦♂️ You just need to set the |
@juanjoDiaz Excellent, thanks! |
How do I configure the stream parser to be able to discard whitespace between JSON messages?
I am facing an issue during implementation of a RPC system, which requires usage of JSONParser to parse binary to JSON for transmitting thru RPC. The issue I am facing is that input streams could be separated by a variety of different whitespaces, and the current implementation of seperator in the lib only supports a single separator.
Assuming as such, we have to keep out input streams in the following manner :
{...message}{...message}
.However, to improve readability, we wish to be able to add whitespaces in between messages, in a similar manner as demonstrated below.
The text was updated successfully, but these errors were encountered: