-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposal] XML representation of feature format #106
Comments
I think it's interesting to try this approach, but I would suggest to focus on the data structures and the implementation, not the format (XML), and make a working model that is either independent from UFO, or that can be stored as UFO lib items. Once it has been proven that this is a good approach forward, we can start considering developing a version of UFO that supports it. |
Good point. I've started working on some code which translates between a TTFont (i.e. the on-disk representation), fontTool's feaLib AST (and from there to feature file format), and this proposed "designer friendly but machine readable" data structure. One nice side-effect of this is that you get an otf-to-fea converter for free. :-) Here is the data structure I'm working with at the moment:
|
Nice, I started looking into something like that for my font format before (OTL as data structure, I mean). Certainly I agree that existing formats to represent OTL aren't satisfactory. Just one question, why do you need multiple match tags, for the f i ligature for example, can't you do:
|
I agree that it may be easier to work with a bottom element of |
I haven't added much to this recently, but I have been working on it, and the fontFeatures library is now able to ingest different OpenType Layout formats into a simple internal representation. I'll add an XML input/output backend to that representation, and then we will have more to talk about. |
@simoncozens Just in case you missed the link go by on twitter, there is a UFO spec meeting coming up. |
Yep, I'm there, which is why this issue is suddenly live again. :-) |
(I'd like to add to the XML vs other formats thing: consider that XML requires custom de/serialization code someone has to write and maintain while you can e.g. ser/de JSON automatically with libraries like serde-json and https://pypi.org/project/cattrs/ -- those libraries make bring-up and maintenance so much easier!) |
I thought XML was part of the deal with UFO... |
JSON is just an example for a trivially ser/de-able format. Could be TOML or YAML or something like it, too. The same goes for Plist, which is why I think Just's proposal would be just as well 😉 |
FWIW, when UFO was first developed, XML seemed "the thing to use". If json or yaml had been around, I bet we would have used it instead of the dreaded plist. The dicts/lists/strings/numbers model is fantastic in its simplicity, and I wished newer formats like designspace had used it instead of creating yet another custom xml-based format. |
JSON is a minefield. YAML is actually a superset of JSON and is even a bigger minefield. Of all the mentioned formats TOML is almost certainly hiding the fewest buried explosives. As @justvanrossum mentioned XML was conceived of and for many years billed as a panacea. In practice it's more of a pandora's box. None of these are native anywhere, and all of them have libraries for dealing with de/serialization. Which ones are fastest / most robust varies a bit between languages. I'm disagree with @madig that JSON is somehow magic/automatic while XML requires custom code. I think that's a false dichotomy, all these formats will typically use some library for de/serializing the on-disk format into native language structures. All that being said I personally love YAML and think it would be well suited for the purpose — but only if the entire spec called for it across the board. It even has a number of features such as includes and references that could be very useful to design a format around. But mixing and matching is the worst option. As long as the principle elements that make up a UFO file are XML encoded, all of them should be. If we want to talk about an all new format I'd propose skirting the minefield by defining a subset of allowed YAML features. But that's a bigger kettle of fish than this issue can support. For now as long as the principle format is XML based, extensions to it should keep following suit. |
I think what he means to say is the mapping between UFO XML and the corresponding data structures isn't straightforward, there needs to be some parsing and rearranging of elements. Compare that with a serialized dump of the data structures themselves, which can be trivially de/serialized. A potential drawback of that approach is a change in the data structures necessarily implies a change in the file format. |
Returning to the topic, after some experience with fontFeatures I think I know what a platform-independent OTL representation should look like; and serialising/deserialising is not really a relevant concern as you will always need to convert between the high-level representation I want to define and whatever structures your font editor uses to represent OTL internally. So I’m not sure this requires so much bikeshedding, and if UFO uses XML for everything else, let’s just use XML and not needlessly multiply parsers. |
Making features using fea file syntax (which already has a parser and AST objects) is easier than using a new set of objects. I think any format that is going to store open type features related to GPOS, GSUB, and GDEF tables could address some major issues: Readability: AFDKO feature file syntax is hard to read and diagnose when it comes to contextual rules. But it's already easier to read the AFDKO syntax file for most cases than to read an XML file. Even if someone writes a parser to convert the XML data to a feature file, the source XML is what the user would need to diagnose and not everyone can manipulate these data using scripting. Maybe another syntax is needed that would make the data model simpler compared to Transfering: We are entering an era that fonts are becoming larger and designers work separately on character sets as a part of a font family. Features can be written separately and then merged in one file or compiled separately as the final user might need different character sets. These fonts need to be merged or subsetted, may be diagnosed before shipping. How this new data model would facilitate this? Tools: How a developer could create a tool that enables them to add rules to glyphs/font without concerning themselves with technicalities. How I can check if kerning pair or mark anchor with a certain context exist? How can I separate the kerning to different script sets or lookup flags without having to learn a complex library? Feedback: There is a time gap between when a user makes an adjustment and when s/he will see a result depending on the complexity of the font. Compile-time matters when it comes to creating UI; Open type features can be sluggish to compile. I'm not sure this is what the format should be concerned with but still something to consider while trying to come up with a data model and compilers. Sorry if this sounds demanding and I'm not asking anyone to solve these! I'm just sharing what concerns me with making open type features and I believe we need real examples that solve some of these issues before making a huge library. |
There are some issues with fea data in context of UFO:
A good exchange format IMHO should
I like feature files, I write them all day long, but it is an authoring syntax not an exchange format. An exchange format does not need to be writable by hand, that as know one designs glyphs by handwriting .glyf files. I can’t currently do any complex OpenType fonts using UFO without heavy project specific customization and heavily tying it to a single tool, otherwise it can easily become a complete mess. |
I wanted to fill in the context of why the Adobe FEA syntax was chosen 16 years ago. At the time there wasn't a fully featured, documented, feature syntax that had adoption and familiarly to type designers other than the Adobe FEA syntax. I believe @typesupply worked for a bit on something but couldn't come up with something that was as easy to write/use as FEA, so that was what was chosen for the UFO with the understanding that it wasn't perfect. @khaledhosny enumerates well where it falls down —all 100% valid complaints and some good ground rules for an exchange format (I do like to edit .glyf files by hand, but yes, I don't draw outlines by writing xml). In the end, the I would say that there is a conflict there: what type designers need in their workflow (a place to write features in a syntax most know well and are unafraid to write) and an interchange/production workflow. As this conversation continues, it's likely good to keep that in mind. |
I still think this is missing the point. The designer's view of their workflow is through their font editor, not through what is stored in the font file. I use Glyphs and I don't really care how features are represented in the .glyphs file; I don't tend to poke around inside it, because I don't need to. For a designer, that's the wrong layer of abstraction. Maybe the font editor should expose features in FEA syntax, maybe it shouldn't. What we're deciding is what gets stored on the disk; how the designer sees and edits that is a client issue. |
These are great questions and the spec very deliberately doesn't answer them. I realized that defining "if a glyph is removed from the glyph set,
Yes, but no, but yes, but no, but yes. 😄 I have very complex feelings on Identifiers and some other things in UFO 3 showed me that trying to introduce new/changed behavior into editors through format changes is not always welcome. I followed the "If you build something better, they'll implement it." model and was very disappointed at the time. So, a new format would have to have to cross that hurdle. I'd like to see explicit rules for going to/from any other major formats (.fea, VOLT, ?). Ideally there would be usable code to do this. Losslessness is going to be a key detail in all of this for designers. One thing that popped out of my memory late last night: back when I was trying to replace .fea circa 2004-9, I noticed that Adobe had patents on the conversion of high-level abstractions to GSUB/GPOS/GDEF. That made me nervous. I don't know if those are still applicable and I could also be very wrong about the existence of these since it was a loooong time ago. Perhaps a discussion with the friendly Adobe folks would clear that up. After saying all of that, I want to emphasize that I'm very optimistic about what you are working on. I'd love to see this replace not just Taking off my spec editor hat and putting on my feature developer hat, I'd like to note some limitations of .fea that I'd love to see addressed:
|
I think it's unrealistic to aim for a single format that 1. should be suitable for all, and 2. covers all of OTL. Small practical solutions in a subset of the problem space may allow us more progress than waiting for someone to come up with the ultimate grand design. Like how kerning ("a table") is stored differently and separately from anchors ("belongs to glyph data"), yet both are used to ultimately compile to GPOS data. In what form data is stored is often informed by how authoring applications interact with it: "kerning as a table" is easier to interact with than a blob of .fea code hidden between other feature definitions. There are different levels of abstractions that each have their place in people's workflows. There is no one size fits all. For many people, the single kerning file in UFO covers all their needs, for others it doesn't, as John Hudson explained so well during the meeting. For some people, .fea is a fine tool, for others it is horrible. Sometimes the best solution is to be as close to the metal as possible and use TTX. Sometimes a glyph naming convention is all you need to define some GSUB features. Perhaps the problem is that .fea is presented as the way to store OTL data within UFO. It should be merely a way. Perhaps our stance should be more extreme: I'm not convinced the UFO spec should contain a full definition for OTL-like features at all. UFO should facilitate a variety of workflows and data. The idea of "mini specifications" could perhaps close this gap. For example, fontmake currently supports the use of MTI files to define features. Fonttools contains a compiler for it. Yet there is no official way for a UFO to say "use this MTI file for features". Likewise, it is undocumented how some tools use anchors to produce GPOS mark features. (With this philosophy, we could decide to demote .fea to a "mini specification": the official In short: I would like to encourage people like Simon to focus on the things that .fea is bad at, and not worry about having to design something that can completely replace .fea. |
The more I think about mini-specifications the less enthusiastic I am. It’s a deliberate fragmenting of the file format. With mini-specifications providing several options of representation, a font editor needing to read an arbitrary UFO needs to support all of the different flavours. |
The more people depend on UFO, the less likely we can come to a consensus as to how to move a monolithic format forward. I think the deliberate fragmentation is essential for progress. A font editor only needs to serve its audience. It is not required so support everything. To not have mini specifications encourages people to define private data and not document it (this has already happened and is a problem), and therefore may reinvent the wheel. With mini specifications people can build on each others experience, and community usage will prove which parts are succesful and which will die out. There will not be infinite ways to define OTL. |
This is kind of the same debate as with Designspace in relation to where it fits into the puzzle. We talked about it in the meeting quite a bit (and see #86) but there doesn't seem to be a clear direction here yet. Indeed it seems to me some aspects of the theorized UFS format are already being put into UFO and the major points of disagreement about what to put in the spec would mostly better fit in the scope of UFS if it existed. How do all the extra attributes (features, kerning, axis interpolation, etc.,) that go into building a font family relate to the base collection of glyphs' outlines? Is UFO attempting to represent all the data that would be in a VCS repository needed to build a font? Or is it just a subset of that data –for example the shapes– meant to be matched with other bits and assembled into a greater whole? Right now there seem to be aspects of both these approaches that have already made their way into both the spec & usage. I can see merit to multiple sides to the debate on what to do with features, and how you evaluate that debate (i.e. whether to define a single format for including feature data in UFO, to allow several possible formats via mini-specs, to keep feature data outside the UFO entirely and just include some kind of build description saying where the feature data should come from) seems to depend on where UFO is visualized in the bigger picture. |
The Grand De-unification of the Unified Font Object... This deserves a broader discussion and shouldn't be hidden in this issue about feature representation. I'd like to see a collection of formats that are designed to work together but are not tightly coupled to each other. "UFO" could be an umbrella for such a collection. (The |
Well, true. But it does influence whether feature representation is one thing or many! One of the major pain points of OpenType is that it is really two font formats pretending to be one font format. I think there is much to be learnt from that. |
For many users,
It's way more than two — SVG, various pixel formats, OTL being separate from outline formats, etc. etc. Sure, together that's an ugly mess, but without the individual parts and how they integrated with existing systems at the time there would be nothing. Things need to be able to grow organically, because you can't predict the future. In hindsight everything seems obvious how it should have be done instead, but hindsight is hindsight. Let UFO be an environment where people can express ideas, and not be a monolithic specification of how things must be done. |
Actually, I just had a shower and inspiration struck. :-) The fact that we're being pulled in two different directions here suggests that there are two problems we need to solve:
Mini-specifications tries to solve both of these problems with the same solution, but leads to fragmentation: now all editors need to implement all representations in order to read an arbitrary UFO. But if we solve them separately, we can avoid the fragmentation and avoid pushing onerous requirements on editor implementors. Instead:
|
I fundamentally disagree that "all editors" should support the superset of anything and everything, really. That has never been the case for UFO, and is so by design. Your idea sounds clever, and perhaps it's a solution, but your format, whichever way it will turn out, will have to prove itself independently of UFO, and can be developed independently of UFO. Just like |
I will soften my previous comment somewhat, because there are obvious ufo connections that are necessary, such as: how will a new feature representation integrate with kerning tables and glyph anchors? |
Keep in mind that that is the state of things today. The mini specs would just formalize it. The UFO format is so bare bones that people come up with their own solutions, which are by definition non-interoperable unless you teach every single other tool about them. |
To build on what @madig said: nothing is lost if a tool doesn't support a mini-spec; the data is preserved. Mini-specs are there to formalize the current informal situation, and to build consensus on things that work well and should be moved up into the full UFO spec. Much better to trial things and see how they work than to make a guess and then have to live with the decision if it turns out to either not work well or not be popular/useful. |
(To build² on that: everyone using custom data and no other tool knowing about it will also by definition lead to data de-synchronization and potentially loss. The same problem happens if you have UFO 3.1 with some more official lib keys and stuff and some apps not knowing what to do with them. We see this all the time with glyphsLib data that only means something to Glyphs and ends up in the wrong places when someone works on the font in FL5. I don't think anyone can win this one, unless consensus is eventually built and the custom data christened as public.) |
@madig by design, the spec says that things a tool doesn't know about should be left and not touched. Desynchronization is definitely possible, as something that doesn't know about a private key can't update it if things change, of course. So, the issue you're having is the tool not doing what it should be, as far as I understand |
The issue is more that every piece of added data that a tool must know about leads to an exponential amount of work on all tools to make them do the right thing. Take e.g. Glyphs master IDs. UFO doesn't know anything about a master IDs, so glyphsLib puts it in a UFO's lib. People then copy-paste the UFO in the file explorer with the lib inside to start a new master in another program, leaving the ID the same. Going back to Glyphs will then overwrite one master with the other and generate a support request, which I then have to deal with :) Arguably, glyphsLib needs a good whacking, but these are the kind of subtle issues you have to deal with with custom data. Having a format that does not need the custom data as a workaround to round-trip perfectly would alleviate the problem because there is only one official way to do things, and if any of you have been wondering, that was indeed the motivation behind a lot of change requests to the UFO format we filed in the name of Dalton Maag. |
Could we move this discussion to the appropriate issue, please? It'll be easier to track that way. 😄 |
Custom data is not a workaround. |
I also believe mini specifications is a great way to move forward. Not every solution is going to be perfect for everyone. I'm making my own feature format and I would like to know how can I make a specification for it that would make it possible to add it to UFO one day as a mini specification. Maybe a new issue with this topic on how a mini-specification should be written would be a great start. |
@typesupply is right, this is derailing feature format talk. Mini-spec is here: #118 |
My understanding of the need is that we want a file format for behaviour description that lends itself to the following use cases:
My experience with FEA suggests that it would be possible to extend FEA to meet these needs. I've done quite a bit of work with FEA for our more complex fonts. We at SIL, like many others, keep our behavioural description out of the UFO for a couple of reasons:
Details of how we have extended FEA can be found here. A simple example is here which simply references a few magic mark and base attachment classes. At the other end of complexity, there is an example here involving complex macros basically to get rid of constants due to sharing a single description across multiple fonts in a family. Of course we could have done it by having font specific include files. I do not consider what we have done in extending FEA to be sufficient to meet the needs initially listed, but I do think it is a step in the right direction. There are a few additions that a font editor would probably want of FEA:
I can imagine something along the lines of:
Which an editor can parse and identify which blocks of the fea go where and come from where. It also enables different tools to handle the fea according to their capabilities. This would require some standardisation of the capabilities and extension functions. I realise this isn't mini specifications, or perhaps it is via a different route. But I suggest it might just work, with a whole lot of effort. |
Hi Martin; this is a good idea, but I'm not sure it's the scope for what we're talking about in this issue. We will be having a discussion of alternative feature format syntax soon off the back of the UFO spec meeting - please subscribe to adobe-type-tools/afdko#1202 and I will send out information about the meeting. What I'm proposing here is a feature representation that is primarily machine manipulated, with the editor mediating that to the user. |
I'd like to propose an alternative to creating an XML syntax for the whole of There are even precedents already in the form of |
An XML syntax for the whole of features.fea would certainly be an engineering achievement. It would have to be able to roundtrip all existing feature files, without losing anything. I'd like to see that work in a separate project first before suggesting it should be a requirement in the UFO spec. |
This is an interesting idea, but, like Erik, I would want to see it develop outside of UFO to work out the edge cases. An approach that could work would be to store your XML files in the data directory and then have a precompiler step that converts your XML to the .fea format for handoff to the compiler or a post-compiler step that creates the tables directly in the binary. |
Related:
I had a thought this morning about how features are represented. Currently we have a flat text file in AFDKO format, which has the advantage of being familiar and well supported with tools, but it has the disadvantage of not being particularly easy for editors to generate, manipulate, parse and reason about. If you want to programmatically copy a lookup from one feature to another, or between languages, or copy a feature between fonts, it's a pain in the head.
What I am suggesting is a new XML format which can be translated to and from AFDKO, and also to and from GSUB/GPOS representations of ttx. The format would remain "designer-centric", in terms of representing the rules at a high level, rather than replicating OTL data structures (i.e. starting with features and lookups, not GPOS/GSUB->script->language->feature). As with AFDKO, rule types would be implicit based on structure, rather than explicit. In other words it would be a half-way house between the textual AFDKO representation and the file-format-specific ttx.
Here is an example of how it might look:
If there's interest, I'm happy to write parsing code to translate to and from AFDKO.
The text was updated successfully, but these errors were encountered: