Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit record parsing hacks in codegen #659

Open
dfrg opened this issue Oct 10, 2023 · 8 comments
Open

Revisit record parsing hacks in codegen #659

dfrg opened this issue Oct 10, 2023 · 8 comments
Labels

Comments

@dfrg
Copy link
Member

dfrg commented Oct 10, 2023

We have longstanding hacks for ValueRecord in codegen and some additional ones have been added for SbitLineMetrics (along with use of size one arrays for embedding BigGlyphMetrics). Are there ways to better handle these types?

@cmyr
Copy link
Member

cmyr commented Oct 10, 2023

An annoying thing here is that we now have two different special-cased records that are special in different ways.

  • I'm fine with deriving Copy for all records that can be parsed without any additional arguments (which means that they are fixed size) but I'd defer to @rsheeter on that; he'd initially wanted to explicitly avoid Copy as a bound to ensure we weren't, well, copying too much.
  • embedded record logic is not quite right, although it would be if all copy-compatible records were Copy (see https://github.com/googlefonts/fontations/pull/637/files#r1352873376)
  • traversal: the hack of treating SbitLineMetrics like ValueRecord seems... no worse than the other options. ValueRecord does require special logic here that SbitLineMetrics doesn't, but since SbitLineMetrics is the only 'normal' record that is ever a field on another record, the alternative would be to have a separate branch to handle it (which would at least just its generated traverse method, without needing to hand-write a method to match what we do with ValueRecord.

So all-in-all, I'm not sure how much better than this we can do. ValueRecord is going to require special casing in any case, and so our other choice is between special-casing SbitLineMetrics as well, or adding general support for the 'record in record' pattern, where that would only ever be used for SbitLineMetrics.

I think until we have another case of record-in-record I would prefer to keep it special-cased, because it's a smaller change and reduces the chances that some subtle change we make to support the general case breaks some other invariant.

@rsheeter
Copy link
Collaborator

I'd definitely like to avoid having "if some specific type" in codegen code, that's ... at the very least a yellow flag.

Remind me why ValueRecord is impossible to deal with?

Naively, what happens if we delete traversal and derive copy where possible?

@cmyr
Copy link
Member

cmyr commented Oct 10, 2023

I'm not going to say 'impossible to deal with', but ValueRecord has a unique combination of qualities:

  • it cannot be parsed naively, because you need to know the appropriate ValueFormat
  • it is not amenable to codegen because the location of the fields jumps around based on the ValueFormat
  • it is a 'record' that is embedded in other records (which was unique until we encountered SbitLineMetrics

That said, ValueRecord generally interacts fine with codegen (in the context of being included in other tables & records.)

traversal & Copy are unrelated, so I'll address those separately:

Traversal: if we delete traversal, we lose the ability to implement Debug in a general way for things that include offsets (a specific problem is tables that includes arrays of records where the records include offsets) because we lose the ability to pass in the correct data that can be used to resolve the offset.

Maybe this is fine? But it is certainly valuable to be able to recursively debug print tables.

We would also lose the otexplorer tool, which I do find useful (although infrequently) for debugging.

Copy:

I think that deriving Copy for records in read-fonts (where possible) is totally reasonable. The previous objection had been that this might make it too easy to do unnecessary memcpying, but I am unsure if this is a real performance risk in practice.

@rsheeter
Copy link
Collaborator

I think we should disregard the concern about copy; if we screw up and it matters it'll show up in profiling.

I think codegen should support record-in-record unless there is a reason this is Very Hard.

I accept ValueRecord having hand-written parsing [and writing I would guess?], but I think we should handle it with the annotation mechanism, conceptually #IWantToHandWriteIt, rather than with "if name is valuerecord".

@dfrg
Copy link
Member Author

dfrg commented Oct 17, 2023

+1 to copy.

Rather than add another annotation, I propose we add extern dyn record T syntax to signify records that require extra processing and inhibit zerocopy when they’re present as a field. All other records will be embeddable.

@rsheeter
Copy link
Collaborator

I propose we add extern dyn record T syntax

Add ... where? If the answer is codegen input then ... why is this better than an annotation? - I would not look at that and go "Oh, he wants to hand-write the read/write code"

@dfrg
Copy link
Member Author

dfrg commented Oct 17, 2023

It reads as simple and direct to me. We already declare extern record ValueRecord (

extern record ValueRecord;
) so the syntax change is just allowing the dyn keyword.

It’s a preference but not a strong one and an attribute is semantically equivalent so I’m fine either way.

@rsheeter
Copy link
Collaborator

It reads as simple and direct to me

How would you know dyn means I want to hand-write some of the implementation? - I definitely would not get that from seeing dyn in a codegen input. Also it's a step backwards from having a general purpose schema to introduce more Rust syntax.

@cmyr cmyr added the codegen label Dec 1, 2023
cmyr added a commit that referenced this issue Mar 5, 2024
That is, if a record has a fixed size we will derive the Copy trait.
This will be necessary if we want to use bytemuck, and should let us
simplify some other things.

- See #659 for previous
  discussion
cmyr added a commit that referenced this issue Mar 5, 2024
That is, if a record has a fixed size we will derive the Copy trait.
This will be necessary if we want to use bytemuck, and should let us
simplify some other things.

- See #659 for previous
  discussion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants