-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better code blocks #17
Conversation
This commit adds some new functionality to code blocks. Firstly it allows arbitrary delimiters meaning that code containing the usual code block terminator can now be parsed correctly. The syntax for this is: {delimiter@ocaml[ ... ]delimiter} The delimiter can contain the chars `[ a-z A-Z 0-9 _ - ]`, the same as the language tag that comes after the '@' symbol. Note that there's no way to have a delimited code block without a language tag. The second piece of functionality is that code blocks can now have associated output: {@ocaml[ ... ocaml code ... ][ ... odoc formatted output ... ]} This syntax also supports the delimiters as above. The delimiters only encode the _code_ block, not the output: {delim@ocaml[ ... ocaml code containing ][ or ]} ... ]delim[ ... odoc formatted output ... ]} The idea is that the odoc formatted output should be well formed and thus any escaping is done in the usual way. The output can then contain, for example, error blocks produced by mdx: {delim@ocaml[ let x = "]} ]delim[ {err@mdx-error[ Line 1, characters 9-10: Error: String literal not terminated]err} ]} In addition this also allows the possibility of code blocks to produce rich output - ie., allowing marked-up such as tables, headings, images and so on, in such a way that they are associated with the code block, and hence can be manipulated by a 'test-promote' workflow.
Review comment from group review in Cambridge: Could we perhaps make the delimiting generic, e.g.:
|
While I think having delimiters elsewhere is a worthy goal, we can do that in another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax looks reasonable, no need to block for more generic delimiters (which would probably still look different on code blocks).
| In_explicit_list -> (List.rev acc, next_token, where_in_line) | ||
| In_tag -> (List.rev acc, next_token, where_in_line) | ||
| In_table_cell -> (List.rev acc, next_token, where_in_line) | ||
| In_code_results -> (List.rev acc, next_token, where_in_line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most cases have the same type and could perhaps be written with an or-pattern ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly not (the return types are different in each case). See the comment immediately below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
@@ -219,18 +217,19 @@ let emit_verbatim input start_offset buffer = | |||
let t = trim_trailing_blank_lines t in | |||
emit input (`Verbatim t) ~start_offset | |||
|
|||
let emit_code_block ~start_offset input metadata c = | |||
let c = trim_trailing_blank_lines c in | |||
let emit_code_block ~start_offset content_offset input metadata delim terminator c has_results = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location calculations in this function are quite complicated and would deserve some comments.
Why is content_offset
needed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a remark on delim_char
but otherwise I think it's ready to merge.
The CI failures are not related, everything is OK.
src/lexer.mll
Outdated
@@ -267,6 +267,9 @@ let raw_markup_target = | |||
let language_tag_char = | |||
['a'-'z' 'A'-'Z' '0'-'9' '_' '-' ] | |||
|
|||
let delim_char = | |||
['a'-'z' 'A'-'Z' '0'-'9' '_' '-' ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without -
? This might interfere with the {- ...}
syntax with a different parser engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I thought I had removed that, as per your previous comment. Will fix, thanks!
@@ -219,18 +217,19 @@ let emit_verbatim input start_offset buffer = | |||
let t = trim_trailing_blank_lines t in | |||
emit input (`Verbatim t) ~start_offset | |||
|
|||
let emit_code_block ~start_offset input metadata c = | |||
let c = trim_trailing_blank_lines c in | |||
let emit_code_block ~start_offset content_offset input metadata delim terminator c has_results = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This commit adds some new functionality to code blocks. Firstly it allows arbitrary delimiters meaning that code containing the usual code block terminator can now be parsed correctly. The syntax for this is:
The delimiter can contain the chars
[ a-z A-Z 0-9 _ - ]
, the same as the language tag that comes after the '@' symbol. Note that there's no way to have a delimited code block without a language tag.The second piece of functionality is that code blocks can now have associated output:
This syntax also supports the delimiters as above. The delimiters only encode the code block, not the output:
The idea is that the odoc formatted output should be well formed and thus any escaping is done in the usual way.
The output can then contain, for example, error blocks produced by mdx:
In addition this also allows the possibility of code blocks to produce rich output - ie., allowing marked-up such as tables, headings, images and so on, in such a way that they are associated with the code block, and hence can be manipulated by a 'test-promote' workflow.