start migrate Field to &str #1772

PSeitz · 2023-01-10T07:52:13Z

start migrate Field to &str in preparation of columnar
return Result for get_field

codecov-commenter · 2023-01-10T08:11:56Z

Codecov Report

Merging #1772 (b3a4367) into main (7c6cc81) will increase coverage by 0.01%.
The diff coverage is 95.97%.

@@            Coverage Diff             @@
##             main    #1772      +/-   ##
==========================================
+ Coverage   94.12%   94.13%   +0.01%     
==========================================
  Files         281      282       +1     
  Lines       52824    52944     +120     
==========================================
+ Hits        49719    49839     +120     
  Misses       3105     3105

Impacted Files	Coverage Δ
src/query/query_parser/logical_ast.rs	`80.00% <ø> (ø)`
src/indexer/index_writer.rs	`96.77% <90.00%> (ø)`
src/query/range_query/range_query.rs	`93.41% <92.45%> (+2.05%)`	⬆️
src/fastfield/readers.rs	`90.27% <94.28%> (+0.79%)`	⬆️
src/aggregation/agg_req_with_accessor.rs	`95.48% <100.00%> (-0.20%)`	⬇️
src/aggregation/bucket/histogram/histogram.rs	`99.56% <100.00%> (-0.01%)`	⬇️
src/aggregation/intermediate_agg_result.rs	`98.42% <100.00%> (-0.01%)`	⬇️
src/collector/filter_collector_wrapper.rs	`84.84% <100.00%> (ø)`
src/collector/tests.rs	`98.18% <100.00%> (ø)`
src/collector/top_score_collector.rs	`96.85% <100.00%> (ø)`
... and 23 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

fulmicoton · 2023-01-16T05:02:12Z

Switching as draft. This kind of change is a one way street, so this requires a lot of more reflexion.

adamreichold · 2023-01-16T08:12:51Z

I would also be interested if this has any performance impact &str is more costly to search/compare/store than u64 (which Field currently wraps).

I also fear that this makes API usage much more awkward as downstream code might be forced to manage Cow<'static, str> instead of Field. ATM, I take care to convert field names as early as possible into Field at my module boundaries so that I can easily pass/store/clone the Field values in its implementation.

This is doubly worrying as neither the commit message nor the PR cover letter give any rationale as to why this change is made beyond "in preparation of columnar" so that it is hard to understand whether this is a fundamental requirement or whether there are options/alternatives that could be discussed.

PSeitz · 2023-01-16T08:58:35Z

I would also be interested if this has any performance impact &str is more costly to search/compare/store than u64 (which Field currently wraps).

I also fear that this makes API usage much more awkward as downstream code might be forced to manage Cow<'static, str> instead of Field. ATM, I take care to convert field names as early as possible into Field at my module boundaries so that I can easily pass/store/clone the Field values in its implementation.

This is doubly worrying as neither the commit message nor the PR cover letter give any rationale as to why this change is made beyond "in preparation of columnar" so that it is hard to understand whether this is a fundamental requirement or whether there are options/alternatives that could be discussed.

Thanks for the feedback. The preparation which was mentioned is adding fastfield support for JSON fields.
We can't use Field for that use case and we want to keep the API consistent between JSON and non-JSON fastfields. The name of the field is a better identifier that can cover both use cases.
The typical workflow as of involves &str anyway. Schema::get_field("name") => load fastfield via Field

As of performance impact, a single lookup in a Hashmap won't show in any performance profile. Even if there are hundreds of fastfields, the lookups will likely be dwarfed by operations on the fastfield.

fulmicoton · 2023-01-16T09:16:07Z

The change has a lot of other benefits. It will evenutally make it possible to modify tantivy's schema

adamreichold · 2023-01-16T09:31:32Z

The preparation which was mentioned is adding fastfield support for JSON fields.

The change has a lot of other benefits. It will evenutally make it possible to modify tantivy's schema

If I understand you correctly, this is basically a prerequisite for schema-less operation? Otherwise, the schema could for example contain paths into JSON fields instead of just names, couldn't it?

The typical workflow as of involves &str anyway. Schema::get_field("name") => load fastfield via Field

I think the main ergonomic downside (that is admittedly very Rust-specific) is that borrowing &str "infects" code with lifetimes (Cow<'static, str> often being the compromise used to avoid that) whereas with Field(u64), passing by value is even cheaper than passing by reference.

Looking at this from an efficiency angle again, since the number of distinct field names is low in schema-bound operation, I would probably end up deduplicating those strings using something string_cache which is just a generic and hence less efficient variant of what Field enables today. (For example, during indexing I currently do the above &str to Field conversion exactly once during start up.)

As of performance impact, a single lookup in a Hashmap won't show in any performance profile. Even if there are hundreds of fastfields, the lookups will likely be dwarfed by operations on the fastfield.

In principle, I agree. There is such a thing "distributed fat" though, which individually is hard to measure, yet it doesn't preclude that it has a measurable cost globally. For example, a hash table look-up might increase instruction cache pressure compared which can result in slowdowns elsewhere. But of course, without benchmarks this is just FUD.

And in the end, you are not accountable to me in any way and I also do not want to hold back any changes necessary for Tantivy to grow. I would be glad if big changes like this one would come with detailed rationales so that an outsider like me can understand that this is either necessary to move forward at all or will have additional benefits that are worth the costs.

start migrate Field to &str in preparation of columnar return Result for get_field

PSeitz force-pushed the use_columnar branch from 680e531 to b3a4367 Compare January 10, 2023 07:54

PSeitz force-pushed the use_columnar branch from b3a4367 to ccb574c Compare January 10, 2023 09:07

PSeitz requested a review from fulmicoton January 10, 2023 09:25

fulmicoton marked this pull request as draft January 16, 2023 05:01

fulmicoton force-pushed the use_columnar branch from 6ae2858 to 64201a2 Compare January 16, 2023 08:30

start migrate Field to &str

e2222eb

start migrate Field to &str in preparation of columnar return Result for get_field

PSeitz force-pushed the use_columnar branch from 64201a2 to e2222eb Compare January 17, 2023 05:34

PSeitz marked this pull request as ready for review January 17, 2023 06:33

fulmicoton merged commit f687b3a into main Jan 18, 2023

fulmicoton deleted the use_columnar branch January 18, 2023 07:12

Hodkinson pushed a commit to Hodkinson/tantivy that referenced this pull request Jan 30, 2023

start migrate Field to &str (quickwit-oss#1772)

6b0d1a6

start migrate Field to &str in preparation of columnar return Result for get_field

PSeitz mentioned this pull request Jan 31, 2023

update lz4 flex PSeitz/tantivy#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

start migrate Field to &str #1772

start migrate Field to &str #1772

PSeitz commented Jan 10, 2023

codecov-commenter commented Jan 10, 2023

fulmicoton commented Jan 16, 2023

adamreichold commented Jan 16, 2023

PSeitz commented Jan 16, 2023

fulmicoton commented Jan 16, 2023

adamreichold commented Jan 16, 2023

start migrate Field to &str #1772

start migrate Field to &str #1772

Conversation

PSeitz commented Jan 10, 2023

codecov-commenter commented Jan 10, 2023

Codecov Report

fulmicoton commented Jan 16, 2023

adamreichold commented Jan 16, 2023

PSeitz commented Jan 16, 2023

fulmicoton commented Jan 16, 2023

adamreichold commented Jan 16, 2023