-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Member query performance #102
Labels
Comments
I think that you can use hash table with a linked list to achieve most functions here. Just make HashTable Node something like:
You can choose to use LinkList or HashTable when implement different functions. Only cost is the memory space to save the pointers. |
Closed
why not just only hash for GenericMember.name. Not necessary for each string. |
the commit 71f0fa7 added a map (so a sorted array) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
From the beginning of RapidJSON, performance of
FindMember()
isO(n)
. I personally also jot this down a long time ago.https://code.google.com/p/rapidjson/issues/detail?id=5&can=1&q=performance
However, today I got e-mail complaining on this again.
I cannot think of a good solution for this. But I would like to write down the thoughts here for public discussion.
String equality test
When querying member by a key, it must involve string equality test operation.
Currently the string equality test is implemented as:
There are three condition checks:
An idea for improvement is to hash the string into a hash code. I noticed that it is possible to add 32-bit hash code without additional memory overhead, and have considered this in the design of the DOM:
The hashcode can be evaluated during parsing with an new option. And then a new variation of
SetString()
or constructor can evaluate the hash code of a string. Finally, use that string to query member. We may initializehashcode = 0
to represent an invalid hash code. Then the code may become:Although the worst case of the test is still O(m), the test can be finished in O(1) when two strings are unequal (thus their hash codes are unequal).
For member query, most string equal tests should be false, so this may improve performance. However, there will be additional O(m) costs for evaluating hash code of each string initially.
Associative Array
A JSON object is basically an associative array, which maps string keys to JSON values.
Currently members are stored in a
std::vector
like manner. Order of members are controlled by user, which depends on the sequence ofAddMember()
or the JSON being parsed. This is actually an important feature when user want to maintain the order of members.However, of course, the query time is O(n) without additional data structure.
There are other possibilities for representing associative array:
Sorted array.
After parsing an object, the members are sorted by their keys by O(n log n). Binary search can be used in query, improving query time to O(log n). Adding new member needs sorting again later (amortized O(log n)). No additional space overhead.
Besides, note that this requires comparison of string lexicographically (not equality), which is an O(m) operation for m equals to minimal length of two strings.
Hash table.
The simple way is using open addressing, so that it will still be a single buffer. Hash code can be computed as in the last section. Insert, query and remove is O(1). When adding new member and the load ratio (member count divided by capacity) is over a constant, the hash table need to be rebuild by O(n). In addition, iterating members will be linear to the capacity, not the count.
If we want to support either or all of these data structures, we have two options: (1) use template parameter to specify which to use in compile-time; or (2) use flags to specify in run-time with overheads in all related operations.
The text was updated successfully, but these errors were encountered: