Revisit record parsing hacks in codegen #659

dfrg · 2023-10-10T03:32:59Z

We have longstanding hacks for ValueRecord in codegen and some additional ones have been added for SbitLineMetrics (along with use of size one arrays for embedding BigGlyphMetrics). Are there ways to better handle these types?

traversal: [read-fonts] parse embedded bitmap tables #637 (comment)
embedded record: [read-fonts] parse embedded bitmap tables #637 (comment)
derive Copy for SbitLineMetrics: [read-fonts] parse embedded bitmap tables #637 (comment)
- Should we just derive Copy for all records?

The text was updated successfully, but these errors were encountered:

cmyr · 2023-10-10T16:34:52Z

An annoying thing here is that we now have two different special-cased records that are special in different ways.

I'm fine with deriving Copy for all records that can be parsed without any additional arguments (which means that they are fixed size) but I'd defer to @rsheeter on that; he'd initially wanted to explicitly avoid Copy as a bound to ensure we weren't, well, copying too much.
embedded record logic is not quite right, although it would be if all copy-compatible records were Copy (see https://github.com/googlefonts/fontations/pull/637/files#r1352873376)
traversal: the hack of treating SbitLineMetrics like ValueRecord seems... no worse than the other options. ValueRecord does require special logic here that SbitLineMetrics doesn't, but since SbitLineMetrics is the only 'normal' record that is ever a field on another record, the alternative would be to have a separate branch to handle it (which would at least just its generated traverse method, without needing to hand-write a method to match what we do with ValueRecord.

So all-in-all, I'm not sure how much better than this we can do. ValueRecord is going to require special casing in any case, and so our other choice is between special-casing SbitLineMetrics as well, or adding general support for the 'record in record' pattern, where that would only ever be used for SbitLineMetrics.

I think until we have another case of record-in-record I would prefer to keep it special-cased, because it's a smaller change and reduces the chances that some subtle change we make to support the general case breaks some other invariant.

rsheeter · 2023-10-10T17:44:20Z

I'd definitely like to avoid having "if some specific type" in codegen code, that's ... at the very least a yellow flag.

Remind me why ValueRecord is impossible to deal with?

Naively, what happens if we delete traversal and derive copy where possible?

cmyr · 2023-10-10T18:52:18Z

I'm not going to say 'impossible to deal with', but ValueRecord has a unique combination of qualities:

it cannot be parsed naively, because you need to know the appropriate ValueFormat
it is not amenable to codegen because the location of the fields jumps around based on the ValueFormat
it is a 'record' that is embedded in other records (which was unique until we encountered SbitLineMetrics

That said, ValueRecord generally interacts fine with codegen (in the context of being included in other tables & records.)

traversal & Copy are unrelated, so I'll address those separately:

Traversal: if we delete traversal, we lose the ability to implement Debug in a general way for things that include offsets (a specific problem is tables that includes arrays of records where the records include offsets) because we lose the ability to pass in the correct data that can be used to resolve the offset.

Maybe this is fine? But it is certainly valuable to be able to recursively debug print tables.

We would also lose the otexplorer tool, which I do find useful (although infrequently) for debugging.

Copy:

I think that deriving Copy for records in read-fonts (where possible) is totally reasonable. The previous objection had been that this might make it too easy to do unnecessary memcpying, but I am unsure if this is a real performance risk in practice.

rsheeter · 2023-10-16T16:58:02Z

I think we should disregard the concern about copy; if we screw up and it matters it'll show up in profiling.

I think codegen should support record-in-record unless there is a reason this is Very Hard.

I accept ValueRecord having hand-written parsing [and writing I would guess?], but I think we should handle it with the annotation mechanism, conceptually #IWantToHandWriteIt, rather than with "if name is valuerecord".

dfrg · 2023-10-17T01:13:25Z

+1 to copy.

Rather than add another annotation, I propose we add extern dyn record T syntax to signify records that require extra processing and inhibit zerocopy when they’re present as a field. All other records will be embeddable.

rsheeter · 2023-10-17T04:32:38Z

I propose we add extern dyn record T syntax

Add ... where? If the answer is codegen input then ... why is this better than an annotation? - I would not look at that and go "Oh, he wants to hand-write the read/write code"

dfrg · 2023-10-17T14:56:14Z

It reads as simple and direct to me. We already declare extern record ValueRecord (

fontations/resources/codegen_inputs/gpos.rs

Line 4 in ca4fea8

extern record ValueRecord;

) so the syntax change is just allowing the dyn keyword.

It’s a preference but not a strong one and an attribute is semantically equivalent so I’m fine either way.

rsheeter · 2023-10-19T17:28:12Z

It reads as simple and direct to me

How would you know dyn means I want to hand-write some of the implementation? - I definitely would not get that from seeing dyn in a codegen input. Also it's a step backwards from having a general purpose schema to introduce more Rust syntax.

That is, if a record has a fixed size we will derive the Copy trait. This will be necessary if we want to use bytemuck, and should let us simplify some other things. - See #659 for previous discussion

dfrg mentioned this issue Oct 10, 2023

[read-fonts] parse embedded bitmap tables #637

Merged

cmyr added the codegen label Dec 1, 2023

cmyr mentioned this issue Mar 5, 2024

[codegen] Derive Copy for simple records #813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit record parsing hacks in codegen #659

Revisit record parsing hacks in codegen #659

dfrg commented Oct 10, 2023

cmyr commented Oct 10, 2023

rsheeter commented Oct 10, 2023

cmyr commented Oct 10, 2023

rsheeter commented Oct 16, 2023

dfrg commented Oct 17, 2023

rsheeter commented Oct 17, 2023

dfrg commented Oct 17, 2023

rsheeter commented Oct 19, 2023

Revisit record parsing hacks in codegen #659

Revisit record parsing hacks in codegen #659

Comments

dfrg commented Oct 10, 2023

cmyr commented Oct 10, 2023

rsheeter commented Oct 10, 2023

cmyr commented Oct 10, 2023

rsheeter commented Oct 16, 2023

dfrg commented Oct 17, 2023

rsheeter commented Oct 17, 2023

dfrg commented Oct 17, 2023

rsheeter commented Oct 19, 2023