Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup for vocab terms without hashmap in RDF reader #99

Conversation

filippodebortoli
Copy link
Collaborator

Leverages the vocab::Vocab enum to avoid building and passing around a Hashmap when deserializing named nodes in rdf::reader.

@phillord
Copy link
Owner

phillord commented Jun 5, 2024

If I have understood the code correctly, the lookup from str to vocab enum happens through the vocab module. But, this ultimately backs onto the find_map iterator method. If this is correct, it is going to make the lookup linear wrt to the size of the vocab, rather than constant time which the HashMap lookup is, although avoiding the cost of building the HashMap in the first place.

It's nice that the code is cleaner, but I would worry that this would impact on performance.

@filippodebortoli
Copy link
Collaborator Author

If I have understood the code correctly, the lookup from str to vocab enum happens through the vocab module. But, this ultimately backs onto the find_map iterator method. If this is correct, it is going to make the lookup linear wrt to the size of the vocab, rather than constant time which the HashMap lookup is, although avoiding the cost of building the HashMap in the first place.

It's nice that the code is cleaner, but I would worry that this would impact on performance.

That is right.
Right now, Vocab::FromStr works as you described.
There is nothing preventing us from changing it to use a HashMap lookup.
This can be done e.g. by invoking Vocab::all() and collect it into a HashMap.
Or by using a more clever way to instantiate the HashMap - we could use std::sync::OnceLock (successor of lazy_static!) to create it similarly to what is done in enum_meta, except that the keys of the map are not discriminants, but strs.

Overall, I think that cleaning up the code is an important first step, and that changes can be done in vocab to ensure that this lookup does not impact too deeply on the parsing runtime. What do you think?

@filippodebortoli filippodebortoli force-pushed the refactor/no-vocab-lookup-in-rdf-reader branch 3 times, most recently from 716b995 to 63bb9a8 Compare June 5, 2024 12:45
@phillord
Copy link
Owner

This looks good now? It is ready for merge?

@filippodebortoli
Copy link
Collaborator Author

This looks good now? It is ready for merge?

Yes, it is. I will merge now.

@filippodebortoli filippodebortoli merged commit 0724e88 into phillord:devel Jun 11, 2024
1 check passed
@filippodebortoli filippodebortoli deleted the refactor/no-vocab-lookup-in-rdf-reader branch July 1, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RDF Parser considered harmful (or at least the iteration part is really ugly)
2 participants