-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start migrate Field to &str #1772
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1772 +/- ##
==========================================
+ Coverage 94.12% 94.13% +0.01%
==========================================
Files 281 282 +1
Lines 52824 52944 +120
==========================================
+ Hits 49719 49839 +120
Misses 3105 3105
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Switching as draft. This kind of change is a one way street, so this requires a lot of more reflexion. |
I would also be interested if this has any performance impact I also fear that this makes API usage much more awkward as downstream code might be forced to manage This is doubly worrying as neither the commit message nor the PR cover letter give any rationale as to why this change is made beyond "in preparation of columnar" so that it is hard to understand whether this is a fundamental requirement or whether there are options/alternatives that could be discussed. |
6ae2858
to
64201a2
Compare
Thanks for the feedback. The preparation which was mentioned is adding fastfield support for JSON fields. As of performance impact, a single lookup in a Hashmap won't show in any performance profile. Even if there are hundreds of fastfields, the lookups will likely be dwarfed by operations on the fastfield. |
The change has a lot of other benefits. It will evenutally make it possible to modify tantivy's schema |
If I understand you correctly, this is basically a prerequisite for schema-less operation? Otherwise, the schema could for example contain paths into JSON fields instead of just names, couldn't it?
I think the main ergonomic downside (that is admittedly very Rust-specific) is that borrowing Looking at this from an efficiency angle again, since the number of distinct field names is low in schema-bound operation, I would probably end up deduplicating those strings using something
In principle, I agree. There is such a thing "distributed fat" though, which individually is hard to measure, yet it doesn't preclude that it has a measurable cost globally. For example, a hash table look-up might increase instruction cache pressure compared which can result in slowdowns elsewhere. But of course, without benchmarks this is just FUD. And in the end, you are not accountable to me in any way and I also do not want to hold back any changes necessary for Tantivy to grow. I would be glad if big changes like this one would come with detailed rationales so that an outsider like me can understand that this is either necessary to move forward at all or will have additional benefits that are worth the costs. |
start migrate Field to &str in preparation of columnar return Result for get_field
start migrate Field to &str in preparation of columnar return Result for get_field
start migrate Field to &str in preparation of columnar
return Result for get_field