Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST metadata -> always quadruples, %{comment: true} #337

Closed
wojtekmach opened this issue Apr 18, 2020 · 16 comments · Fixed by #354
Closed

AST metadata -> always quadruples, %{comment: true} #337

wojtekmach opened this issue Apr 18, 2020 · 16 comments · Fixed by #354

Comments

@wojtekmach
Copy link
Contributor

wojtekmach commented Apr 18, 2020

First some background, in ExDoc currently for this module:

# lib/foo.ex
defmodule Foo do
  @moduledoc """

  `String.upcase/9`

  """
end

when generating docs, we are emitting the following warning which is useful to find and fix broken references:

warning: documentation references function String.upcase/9 (...) (parsing Foo docs)

I'd like to make the warning more precise and mention file:line like this:

warning: documentation references function String.upcase/9 (...)
  lib/foo.ex:4

To do that, I need to know where the warning happened within the markdown document.

Today, when parsing markdown into AST we get:

iex> {:ok, ast, []} = Earmark.as_ast("foo\n`bar`"); ast
[{"p", [], ["foo\n", {"code", [{"class", "inline"}], ["bar"]}]}]

and I believe the best option to generate the warning above is to put the line on the ast node (perhaps with :lines option or something)

iex> {:ok, ast, []} = Earmark.as_ast("foo\n`bar`", lines: true); ast
[
  {"p", [{"_line", "1"}],
   ["foo\n", {"code", [{"_line", "2"}, {"class", "inline"}], ["bar"]}]}
]

We already use line for Earmark errors, so it doesn't seem like a huge leap to add it to nodes.

The problem is of course when we transform ast back into HTML, we probably don't want to keep the line attribute around so I thought maybe prefixing it with _ will signal it should be ignored. And even though the line number is an integer, to be consistent with the rest of AST, it probably makes sense to write it down as a string like above.

Another option, which is much bigger change, is to represent the AST as a 4-element tuple, {tag, metadata, attributes, ast}:

[
  {"p", [line: 1], [],
   ["foo\n", {"code", [line: 2], [{"class", "inline"}], ["bar"]}]}
]

and here we have a clear distinction between the node contents and metadata.

Besides which line, I think another useful piece of information would be which column the given node starts at. But imho this is definitely lower priority.

What do you think? Perhaps there's an easier way to support my ExDoc use case?

cc @RobertDober @josevalim

@josevalim
Copy link
Contributor

It feels like ExDoc should roll its own AST to HTML, which should be straight-forward, and we can prune any node that we add. Especially because it is likely that we will make the AST more tailored to ExDoc as more time passes.

@wojtekmach
Copy link
Contributor Author

due to other requirements, in ExDoc we already have our own way of transforming ast->html. This is mostly about markdown->ast, we need to get the extra information out of Earmark and put it into ast somehow.

@josevalim
Copy link
Contributor

Ah, I see!

@RobertDober
Copy link
Collaborator

RobertDober commented Apr 18, 2020

I would advice against adding pseudo attributes to the AST as we have annotations for this

https://github.com/pragdave/earmark#earmarktransformtransform2

so when parsing with the :lnb option I would rather spit out the following AST:

[
  {"p", [],
   ["foo\n", {"code", [{"class", "inline"}], ["bar"], %{lnb: 2}}]
   %{lnb: 1}}
]

It might even be a good idea to default the :lnb option to true.

 Earmark.as_ast("`hello` **strong**")
{:ok,
 [
   {"p", [],
    [
      {"code", [{"class", "inline"}], ["hello"], %{lnb: 2}},
      " ", 
      {"strong", [], ["strong"], %{lnb: 2}}
    ], %{lnb: 1}} 
 ], []}

@wojtekmach
Copy link
Contributor Author

wojtekmach commented Apr 18, 2020

as we have annotations for this

fantastic!

My only suggestion would be to consider not returning different AST shape based on whether an option is set or not; instead always return the same shape (4-tuple), just with empty annotations.

:lnb

as in, Line NumBer? :) nitpick: unless you need to maintain backwards compat, I'd consider switching this to :line since that's what's in Elixir AST:

iex> Code.string_to_quoted("foo\nbar")
{:ok, {:__block__, [], [{:foo, [line: 1], nil}, {:bar, [line: 2], nil}]}}

but at the same time it doesn't matter how the key is called, I'm really happy there's already infrastructure for this!

In any case, I'm happy to look into returning :lnb in the ast.

@RobertDober
Copy link
Collaborator

RobertDober commented Apr 19, 2020

ad Naming

oh I was just lazy please pick a name you would like, as long as it reads line_number ;)
Seriously now, I feel line is a missnamer, lnb is too short, so either line_number or line_nb

BTW Earmark also uses line for line_number, but I'd prefer to expose a correct name.

ad 4 tuples

Adding always an empty map, why not, however @josevalim had made a point concerning this when we talked about comments, that not having the same shape might also be an advantage sometimes.

Adding an empty map might induce a different kind of pattern matching than matching on the shape.
However I do not have a strong opinion on that, so if there is a consensus on what is better for ex_doc I'll definitely go with that.

@wojtekmach
Copy link
Contributor Author

wojtekmach commented May 15, 2020

@RobertDober if 3 vs 4 element tuple is useful to differentiate between comment and non-comment nodes, I believe the best way to go is to standardise on 4 element tuples and put %{comment: true} for comments. What do you think?

@RobertDober
Copy link
Collaborator

RobertDober commented May 15, 2020

Ok with me to have always 4 element tuples, for comments I am not sure but it is not important, whatever serves ex_doc best...

That all said I must emphasise that right now I can only spend minimal time on Earmark as I have found a job again and I have no idea of the ETA of 1.5. Sorry about that (who would not be sorry to work with legacy Rails code instead of Elixir, right?)

@RobertDober RobertDober changed the title AST metadata AST metadata -> always quadruples, %{meta: %{comment: true}} May 27, 2020
@RobertDober RobertDober self-assigned this May 27, 2020
@RobertDober RobertDober added this to the 1.4.5 milestone May 27, 2020
@RobertDober RobertDober changed the title AST metadata -> always quadruples, %{meta: %{comment: true}} AST metadata -> always quadruples, %{comment: true} May 27, 2020
@wojtekmach
Copy link
Contributor Author

wojtekmach commented May 27, 2020 via email

@RobertDober
Copy link
Collaborator

I apologize, you were to fast for me...
I realized that I will not be responible for the AST metadata any transformer should know what they are doing, I will get rid of meta: at all, the contract will be that the generated AST will contain %{comment: true} for comments and %{verbatim: true} for HTML blocks and that is that.

Sorry for having wasted your time.

@wojtekmach
Copy link
Contributor Author

Sounds good to me!

@RobertDober
Copy link
Collaborator

Pushing into 1.4.6 as #342 needs a hotfix

@RobertDober RobertDober modified the milestones: 1.4.5, 1.4.6 Jun 1, 2020
@RobertDober
Copy link
Collaborator

Finally got time to work on this @josevalim @wojtekmach is it ok for you to maintain that the comment node starts with the symbol :comment or would you prefer the string "comment" now that the fourth element will contain %{comment: true}

I did not overlook that we had discussed this already, or did I?

I can change that quickly before release so no stress with that, I guess you will be quite busy right now :)

1 similar comment
@RobertDober
Copy link
Collaborator

Finally got time to work on this @josevalim @wojtekmach is it ok for you to maintain that the comment node starts with the symbol :comment or would you prefer the string "comment" now that the fourth element will contain %{comment: true}

I did not overlook that we had discussed this already, or did I?

I can change that quickly before release so no stress with that, I guess you will be quite busy right now :)

@wojtekmach
Copy link
Contributor Author

I don’t have an opinion and either sounds good to me, I will be most likely pattern matching on just comment: true anyway!

@RobertDober
Copy link
Collaborator

so I'll keep the symbol

@RobertDober RobertDober linked a pull request Jun 21, 2020 that will close this issue
Merged
RobertDober added a commit that referenced this issue Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants