-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iox #859 implement prefix tree #1057
Iox #859 implement prefix tree #1057
Conversation
0dc4e56
to
20802f5
Compare
/// @param key key whose values we want to find | ||
/// @return pointers to values matching the key | ||
/// @note returned pointers allow modification of corresponding values in the tree | ||
cxx::vector<Value*, Capacity> find(const Key& key) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interface is suitable for currently intended use cases (string search in service registry) but can of course be extended later as needed. Returning values can be done in addition, but should not be the default.
The important part is to support efficient and (somewhat) convenient insertion, removal and look-up (for reading and modification) of values.
Since it has multi-map semantics (required for our use case), I opted against operator[]
for value access (there can be more values per key and we need to support this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote here to use as return value a cxx::vector<std::reference_wrapper<Value>, Capacity>
.
In our safety environment you force a nullptr
check onto the user since in theory a pointer could be always null but a reference is always non null. And this nullptr
check can be expensive if one iterates over a large vector.
Using this in a vector is also straight forward:
myVector.emplace_back(std::ref(myValue));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and I saw that you are breaking the const correctness here. A user could acquire such a vector and is able to change the underlying values despite that this method is const and the tree should not be modifiable in such a scope. When further thinking about this, this can be even more hazardous.
Lets assume someone acquires such a result and is moving this result into a separate thread and now another thread is changing the prefix tree. One can run into race conditions extremely easy here since one is not aware that the return value contains references to the underlying values which can change.
With this in mind I would really prefer a copy here which would be just safer. But I have to admit that this would be quiet inefficient here I would propose that you implement the callback find interface idea you mentioned earlier. Something like:
void find(const Key & key, const cxx::function_ref<void(const Value& value)> &callback) const noexcept;
and the callback is called for every fitting key.
7ddd378
to
9d2a0d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please create a small design document for this feature
@elBoberido Yes, I will create a design document. It shall contain the requirements and high level ideas but I will avoid code there. I usually distinguish between abstract/high-level design for the ideas and concepts and detailed design for e.g. interfaces and implementation details (should be used sparingly since they are subject to change). I will create a high-level design document. Note that I am first and foremost interested in whether we can live with the public interface for now. The functionality will not change much I think, but the interface is up to discussion (to a degree, we will not get the "perfect" now). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed only until line 105 in prefix_tree.inl (insert).
/// @brief Remove a key and all its associated values from the tree. | ||
/// @param key key to be removed | ||
/// @return true if was removed, false otherwise (key did not exist) | ||
bool remove(const Key& key) noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the bool return value here. If you call remove
the contract states that in either case the key is removed independent of the return value.
But when you read something like
if ( myTree.remove("someKey") == false )
// ...
One may easily interpret this as some kind of error and implements some excessive error handling around it - and this has to be tested etc. It is not like it already happened before in the service registry.
If one would really know if something is present one could always use find
or maybe it even makes sense to implement a method like bool hasKey(const Key &key)
/// and to access values for modification | ||
|
||
/// @todo value versions of find and similar functions? - copies values out | ||
/// @todo callback interface of find and similar functions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty nice idea!
/// @param key key whose values we want to find | ||
/// @return pointers to values matching the key | ||
/// @note returned pointers allow modification of corresponding values in the tree | ||
cxx::vector<Value*, Capacity> find(const Key& key) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote here to use as return value a cxx::vector<std::reference_wrapper<Value>, Capacity>
.
In our safety environment you force a nullptr
check onto the user since in theory a pointer could be always null but a reference is always non null. And this nullptr
check can be expensive if one iterates over a large vector.
Using this in a vector is also straight forward:
myVector.emplace_back(std::ref(myValue));
/// @return pointers to values matching a key containing the prefix | ||
/// @note returned pointers allow modification of corresponding values in the tree | ||
/// @note exact matches are included if there is a key equal to prefix | ||
cxx::vector<Value*, Capacity> findPrefix(const Key& prefix) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I mentioned above, please use cxx::vector<std::reference_wrapper<Value>, Capacity>
as return value.
/// @param key key whose values we want to find | ||
/// @return pointers to values matching the key | ||
/// @note returned pointers allow modification of corresponding values in the tree | ||
cxx::vector<Value*, Capacity> find(const Key& key) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and I saw that you are breaking the const correctness here. A user could acquire such a vector and is able to change the underlying values despite that this method is const and the tree should not be modifiable in such a scope. When further thinking about this, this can be even more hazardous.
Lets assume someone acquires such a result and is moving this result into a separate thread and now another thread is changing the prefix tree. One can run into race conditions extremely easy here since one is not aware that the return value contains references to the underlying values which can change.
With this in mind I would really prefer a copy here which would be just safer. But I have to admit that this would be quiet inefficient here I would propose that you implement the callback find interface idea you mentioned earlier. Something like:
void find(const Key & key, const cxx::function_ref<void(const Value& value)> &callback) const noexcept;
and the callback is called for every fitting key.
// prefixLength will contain the length of the existing maximum prefix of letters | ||
// already in the tree (could be all letters) | ||
// This information is needed to locate existing entries efficiently. | ||
Node* findPrefix(const char* letters, uint32_t length, uint32_t& prefixLength) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can adjust the API here so that you use const Key & key
instead of const char* letters, uint32_t length
. In all your uses you get the letters and length always from the key and then put them into this function.
Furthermore I would introduce a short struct called something like
struct NodeProperties { // I know, the name is not perfect ;)
Node& node;
uint32_t prefixLength;
};
and return it here. This would make the API more readable and we avoid discussions in the code like uninitialized variables. The final version could then look like:
NodeProperties findPrefix(const Key & key) const noexcept;
// we need to create the suffix path beyond node and then add our value to the final node | ||
|
||
const char* suffix = &letters[prefixLength]; | ||
uint32_t suffix_len = length - prefixLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use lowerCamelCase
. When you activate clang-tidy
in your IDE and use our .clang-tidy
rules you will get a hint directly in the IDE.
// prefixLength will contain the length of the existing maximum prefix of letters | ||
// already in the tree (could be all letters) | ||
// This information is needed to locate existing entries efficiently. | ||
Node* findPrefix(const char* letters, uint32_t length, uint32_t& prefixLength) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method has the same name as the method in the public API. This is a little bit confusing and since they are performing different task please rename one of them.
|
||
removeNodes(key); | ||
|
||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing this method I do not really see why we have to return a boolean to the user at all. And strictly speaking we are also somehow lying to them, the key still exists but does not contain any data.
Node* child = node->child; | ||
while (child) | ||
{ | ||
getValuesFromSubTree(child, result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think recursions are forbidden in code we want to certify.
@MatthiasKillat what would be extremely helpful is a class diagram and a diagram similar to what you had in your C++ talk, showing the interaction of the Nodes and how a simple lookup will be performed. This would make it much easier to build a mental model for the code to review |
@elBoberido Class diagram makes sense. Not sure what kind of diagram you had in mind for he lookup. Since it relies on proper positioning it may be hard to generate with PlantUML. Sketching it in ascii will work of course. I am on it, please stop reviewing until then. |
@MatthiasKillat I meant sketching in ascii. Using PlantUML for this would be horrible |
@elBoberido added a design document which hopefully explains the intended behavior, use case and design decisions and helps building the mental model needed to understand the underlying algorithms. I avoided code there and have no class diagram yet. Class interfaces and interaction are of little concern here, the node structure is. Diagrams can be added though, if needed. Boils down to the interfaces and some "has_a" relations though, do we want that? |
|
||
### Node | ||
|
||
We use a [de la briandais tree](https://www.scirp.org/%28S%28lz5mqp453edsnp55rrgjct55%29%29/reference/referencespapers.aspx?referenceid=1742945) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find find a good source for this, but this is the original paper. It is also not terribly important as it is concerned with the pointer structure of the nodes which I explain below.
@MatthiasKillat I'm not sure if we need a simple class diagram. Maybe it is necessary for certification. |
a501062
to
f2de328
Compare
|
||
### Brute-force | ||
|
||
* arrays work well enough up to some point (estimate in 100-300 strings), depending on the searches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also in the 1000s, but 10000s are probably too much for brute force search (depending on application). Search time in the tree is basically constant if the key length is limited.
@MatthiasKillat you once mentioned you wanted to split this up into multiple PR with preparation work being merged first. Is this still your plan or are you waiting for review? |
@elBoberido Yes, I want to split at least the memory allocator part as I would like to have it as a clean abstraction for the future (could also be used in posh, maybe). I need to think how to continue with the tree, there is no immediate need. Ideally we would have some of thse data structures for shared memory usage and I would also like to benchmark them. But this has lower priority. |
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
…dynamic allocation Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
Signed-off-by: Matthias Killat <matthias.killat@apex.ai>
f2de328
to
ac14626
Compare
@elBoberido I still have the plan to move the allocator from this PR to another PR and then probably close this one. However, I had no time yet due to other priorities. It is also not urgent. We can close this one regardless. Still, I consider part of the work important for future data types (just not now with current priorities). |
Currently it is not needed, has diverged considerably and is relatively complex to review. The basic ideas will be revisited for strategies to create tree structures for iceoryx and as shared memory types. The data structure could be useful for string indexing internally, and more specifically in a service discovery refactoring. |
Pre-Review Checklist for the PR Author
iox-#123-this-is-a-branch
)iox-#123 commit text
)git commit -s
)task-list-completed
)Notes for Reviewer
Checklist for the PR Reviewer
Post-review Checklist for the PR Author
Review notes
TypedAllocator
as a building block for the prefix tree (but is more generally useful), should be reviewed firstPrefixTree
, how it works may require some more explanation but I am hesitant to create a design document since it is a known data structure (with additional complications due to our static memory restrictions)References