-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: sort object to accelerate FindMember method #1978
Comments
There is an analysis long time ago #102 . |
This kind of sounds like a post-process routine that indexes the data. That doesn't seem like it's something In my past experience doing optimizations, string comparisons are kind of your enemy. I would fear even with |
Yes, sorting object can be considered as a kind of post-process, and easily put in other util lib, and at first I just try to write a free function something like: namespace jsonutil
{
void SortObject(rapidjson::Value& json);
const raidjson::Value* FindMemberBinary(const rapidjson::Value& json, const std::string& name);
} But with this, it cannot be used in oop manner. I think it would be better if we have official support for this, then can write as When compare key name string, I also have another opinion. Since we always have I also noticed there is reserved space for hash code in string type |
My primary concern with a sort and binary search of the JSON object is more the difficulty in knowing when the linear search will have better performance characteristics. Having to move objects around in the layout isn't going to be cheap in some scenarios and may require some space to hold elements while performing swaps. Additionally, since there's reliance on a precondition, it would be very easy for a person to misuse the API and forget the sort call (and that's ignoring a modifiable object where additional items being added won't guarantee the sort order is maintained).
My curiosity had me looking at another library's implementation of key lookups (simdjson, ArduinoJSON). They have a linear search as well through the elements and have their own string compare methods (which are quite close to the rapidjson approach, believe it or not). Makes me wonder if optimizing key lookups in a JSON object are not actually a high priority for most users of JSON libraries as the key lookup itself is "fast enough" to get the values to the places where they are accessed a lot (meaning the key lookup isn't in the performance critical path) or the key lookup isn't used at all (instead, iterate through members on their own so string compare happens once per element and there is no lookup).
It might be better to provide users the option of a more optimized key lookup at the cost of insertion overhead (meaning it would impact parse times). In that case, the values could enforce sorted order always or provide some other mechanism to enable optimized key lookups. I don't know how simple that would be to provide in the current architecture. |
I also concern about the linear complexity O(n) of FindMember() methods as some other users.
And I also noticed there is a macro to enable std::multimap to represent an object, but I think that solution may not be perfect, as it is not memory friendly and the layout is not compatible with before.
So I bring up another approach as following:
SortObject()
method forGenericValue
type, to sort object by the key name, which is typically in complexity of O(NlogN) ;FindMemberBinary()
method to find member in an already sorted object, which is in O(logN).GenericValue::Flag
which is in the last two bytes, there is still unused bit, we can use one bit to mark the object is sorted, then theFindMember()
method knows when a binary search is suitable and when should use the old linear search method.I also think I can understand why rapidjson use linear search by default, beside for simple implementation. When the key number of object is small, there is no significant difference between linear search and binary search(or map find). I noticed in many practical projects (for example HTTP API body), there is only several keys in a json object, and so the default behavior of rapidjson works well in such cases.
The idea is give the choice to programmer, who can decide whether there is need to sort the object before (many repeated)
FindMember
s.Sort object requires no extra memory, but after any
AddMember()
, the object may not keep full sorted. So this basic approach is especially suitable for read-only json document/tree. We could further extend theGenericValue::Flag
to mark some object need to reserve sorted inAddMember()
, while in trade of O(N) complexity.We can call the read-only sorted flag as "Week Sorted", the writeable reserved sorted flag as "Strong Sorted", however I think the later flag may be less useful than the previous one.
The text was updated successfully, but these errors were encountered: