[API change] Modify Attr, add it to relevant types #79

despresc · 2020-09-08T19:11:54Z

Resolves jgm/pandoc#684 in this library. The new Attr type is

data Attr = Attr Text [Text] (Map Text Text)

as I mentioned in that issue. Everything that could reasonably have an Attr has one now.

I haven't added

data ListItem = ListItem Attr [Block]
data ListLabel = ListLabel Attr [Inline]

for the lists yet, but I can do that if it's reasonable.

despresc · 2020-09-08T19:31:48Z

I think I would adapt pandoc to this request first, then the Format one (if both are good), since this one will involve more extensive modifications to the writer tests: the native input in the writer tests will have to be adapted to the IR type changes. To allow normal development in both packages, I can rebase the other request (once it's done) on top of this one, then modify pandoc in two requests, keeping the branches up to date with master.

I'm also interested in the Figure and Ref changes (since these four changes are essentially all the things in the list of projects that require a change to the IR), but I'm not sure how much appetite there is for so many changes.

srid · 2020-09-08T19:54:46Z

Is there a reason why this cannot just be:

type Attr = Map Text Text

i.e., why not put both id and class in the attribute Map?

despresc · 2020-09-09T16:37:08Z

Both are used pretty frequently in pandoc, internally, so one reason is that it's more convenient to have them separated so that a simple pattern-match can get to them. The id attribute could still go in the map, in principle, but class is better kept separate and as [Text], rather than keeping it encoded in a single Text value.

jgm · 2020-09-10T16:22:13Z

If we're making a large scale change of this type, it might be worth considering a different alternative (though this would be an even more radical change).

Something like:

data Block = Block Attr BlockT
data BlockT = Header Int [Inline] | Para [Inline] | ...
data Inline = Inline Attr InlineT
data InlineT = Str Text | Emph [Inline] | ...

This gives attributes to everything in a uniform way. Granted, it would be a much more painful change to make to the code base, though maybe PatternSynonyms could make it less painful. Hence, probably not worth considering at the time. But I thought it should be mentioned.

despresc · 2020-09-10T20:03:48Z

That was my thinking as well.

I should mention that in the Semigroup instance for Inlines, two adjacent Emph elements are only merged if they have null attributes (and similarly for the other relevant inlines). That could be too strict of a requirement.

despresc · 2020-09-10T20:09:03Z

On that subject, the new Attr doesn't have a Semigroup instance defined. Would this:

(Attr i1 c1 k1) <> (Attr i2 c2 k2) = Attr i' (c1 <> c2) (k1 <> k2)
  where
    i' = if Text.null i1 then i2 else i1

be appropriate? The left/right priorities for the components could be different.

despresc · 2020-09-10T20:24:13Z

I should also say that factoring out the Attr like you mentioned is a little more generous than what's done in this pull request. A few of the Inline and Block branches don't have an Attr component: all of Plain, Null, Str, Space, SoftBreak, and LineBreak, to be specific.

jgm · 2020-09-16T16:34:34Z

General note: Thanks for all of these concrete proposals for pandoc-types. Because of other things lined up, it may be a few weeks before I can really think seriously about them.

On Attr: were you saying that you'd prefer the approach I sketched of separating out the Attr from the base block types?

On Semigroup: the instance you propose seems as good as any.

On Semigroup for Inlines: I would think that a good general principle would be that adjacent elements of the same type can merge if they have identical Attr. That would be more generous than "if they have null Attr." Is there something to be said against it?

On the Attr type: there is a drawback to using Map. (I don't know if this was discussed before.) The drawback is that you don't get deterministic output with Show (or with ToJSON, etc.). This could perhaps be dealt with by defining Show explicitly (similarly ToJSON, etc.). But my reason for using a list originally was to allow order information to be preserved deterministically. I'm not sure what to think about this. In many ways Map is more correct (disallowing multiple attributes with the same key) as well as more performant.

despresc · 2020-09-16T19:21:58Z

You're welcome. I'll still be free for a while to adapt pandoc to any changes, so there isn't much rush (though I'd prefer working sooner rather than later, of course).

I think I like separating out attributes. That's a common approach for compilers that need to annotate every node of their AST with additional information, and a uniform treatment of attributes would be nice. Downsides (maybe small) that I can think of, other than it being a larger change: pattern matches will get a little more complex, at least in the short term, everything becoming Block attr (Constructor ...) instead of Constructor attr ...; the AST will be larger in native/json/lua output, since Str et al. get attributes; and things will get attributes that don't have them in any output format (though the attributes can simply be ignored in those cases).

Another design is this:

data WithAttr a = WithAttr Attr a
type Block = WithAttr BlockT
type Inline = WithAttr InlineT

for uniformity, but I'm not sure if it makes too much difference. Having separate types for other attribute-carrying elements (like RowT or CellT, or however they're named) is also possible; that would be more necessary in the WithAttr representation than yours, to keep the representation of attribute-carrying things more uniform.

I can't think of any objections to that method of combining Inlines.

I think the output of Show for Map is deterministic, since it converts the map to a list with toAscList before showing it. You're right that the ToJSON instance isn't, so that would need to be defined separately to preserve determinism. Actually, one possibility is to give everything an explicit toEncoding implementation. That way the Attr instance can use the toEncoding for Map, which folds over the map directly and so outputs an object with sorted keys, from what I can tell. I think that would also have the effect of making encode a bit faster, and could be used to make the output for Meta deterministic as well.

I can't really comment on the need to preserve key ordering or duplicate keys. I don't think that pandoc ever relies on that property, at least. There might be users or people in the wider pandoc ecosystem that do, but I don't have any examples (mostly from not being familiar with a lot of filters).

despresc · 2020-09-16T20:07:12Z

Maybe the downsides for factoring out Attr are bigger than I think? There will be a certain amount of annoyance having to deal with an extra layer on top of the existing Block and Inline. That's harder to avoid in Haskell, but the json (and maybe the lua) serialization could be adapted. The data Block = Block Attr BlockT type could be serialized as

{"t":"BlockTConstructor","a":"attributes","c":"content"}

and the lua output might be similarly adjustable. That might be an improvement.

tarleb · 2020-09-17T07:04:23Z

Different general note: I've started work on ast-migrator. The goal is to convert from one AST version to another. That should give us more room to do drastic changes without impacting library (and filter) users too much. Maybe it's enough to prevent a Python 2/3 kind of situation.

I'll move the project to the pandoc organization if it is deemed useful.

jgm · 2020-09-17T16:56:14Z

I'm convinced that we can handle the switch to a Map rather than a list.

It's definitely true that going general with the Attr would be much more of a pain in the short term, since the code modifications required are much more intrusive. That may argue for a more gradual approach, even if it's less elegant.

The new Attr type, in addition to being a data type, changes the key-value pairs representation to a Map. An Attr component has been added to every relevant branch of Inline and Block. It has also been added to Caption. New Attr conversion functions have been added to Definition, and new builders for the types that received an Attr component have been added to Builder. The representation of Attr in JSON is different: the key-value pairs are now an object. Additionally, the t_table test was a bit too large to inspect for correctness directly. It has been simplified, and new tests for the top-level components of Table have been added.

The semigroup operation is left-biased for identifiers and attributes, and concatenates the classes.

The semigroup instance for Inlines will now fuse together appropriate inline elements when they have equal attributes, not just null attributes.

tarleb · 2020-10-08T15:31:04Z

An alternative to using a Map could be to have something like Swift's Ordered Dictionary. The datatype combines an array with a map. My understanding is that JavaScript's Map does something similar.

despresc · 2020-10-09T14:36:10Z

Something like Vector (Text, Text)? Though it could be [(Text, Text)] still (maybe under a newtype), and just be sorted and have its duplicate keys removed.

tarleb · 2020-10-09T19:27:41Z

I checked the Swift implementation, it is basically

data OrderedDictionary k v = OrderedDictionary
  { orderedKeys :: Vector k
  , keysToValues :: Map k v
  }

Which seems like overkill here.
Given our use-case, and the size of a typical attribute collection, a simple newtype might be a good idea.

srid · 2020-10-09T19:35:16Z

Data.Map.Ordered from ordered-containers?

An OMap behaves much like a Map, with mostly the same asymptotics, but also remembers the order that keys were inserted. All operations whose asymptotics are worse than Map have documentation saying so.

jgm · 2020-10-09T20:13:13Z

That does fit, but it's always a liability to take on an additional library dependency. One more thing to break if upstream maintainers don't keep it up to date.

despresc force-pushed the attrs-on-all branch from 5c3e093 to 546af89 Compare September 9, 2020 17:29

cjjdespres added 3 commits September 18, 2020 17:35

Add a Semigroup and Monoid instance for Attr

0e7f808

The semigroup operation is left-biased for identifiers and attributes, and concatenates the classes.

Allow Inlines to fuse elements in more situations

6a00685

The semigroup instance for Inlines will now fuse together appropriate inline elements when they have equal attributes, not just null attributes.

despresc force-pushed the attrs-on-all branch from d7322bb to 6a00685 Compare September 18, 2020 21:39

Remove duplicated table component tests

9f70337

srid mentioned this pull request Mar 26, 2022

Walkable instance (and newtype) for Attributes #88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API change] Modify Attr, add it to relevant types #79

[API change] Modify Attr, add it to relevant types #79

despresc commented Sep 8, 2020

despresc commented Sep 8, 2020

srid commented Sep 8, 2020

despresc commented Sep 9, 2020

jgm commented Sep 10, 2020

despresc commented Sep 10, 2020

despresc commented Sep 10, 2020

despresc commented Sep 10, 2020

jgm commented Sep 16, 2020

despresc commented Sep 16, 2020

despresc commented Sep 16, 2020

tarleb commented Sep 17, 2020

jgm commented Sep 17, 2020

tarleb commented Oct 8, 2020 •

edited

Loading

despresc commented Oct 9, 2020

tarleb commented Oct 9, 2020

srid commented Oct 9, 2020

jgm commented Oct 9, 2020

[API change] Modify Attr, add it to relevant types #79

Are you sure you want to change the base?

[API change] Modify Attr, add it to relevant types #79

Conversation

despresc commented Sep 8, 2020

despresc commented Sep 8, 2020

srid commented Sep 8, 2020

despresc commented Sep 9, 2020

jgm commented Sep 10, 2020

despresc commented Sep 10, 2020

despresc commented Sep 10, 2020

despresc commented Sep 10, 2020

jgm commented Sep 16, 2020

despresc commented Sep 16, 2020

despresc commented Sep 16, 2020

tarleb commented Sep 17, 2020

jgm commented Sep 17, 2020

tarleb commented Oct 8, 2020 • edited Loading

despresc commented Oct 9, 2020

tarleb commented Oct 9, 2020

srid commented Oct 9, 2020

jgm commented Oct 9, 2020

tarleb commented Oct 8, 2020 •

edited

Loading