-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should the final transformed parse structure look like? #31
Comments
On the other hand, a flat list of headlines and sections seem to be very pragmatic. It makes it easier to change headline level and order.
How to keep headline and section together here? This is not pragmatic. Section should belong to the headline. How about this?
|
A couple of months ago, I've had at look with @munen at what organice expects as a data structure. From what I recall and based on the discussion in #27 I want to suggest the following (at least for depth 1 and 2): {;; "In-buffer Settings", see https://orgmode.org/manual/In_002dbuffer-Settings.html
:settings ...
;; Let's call text before the first headline the preamble. As each headline introduces a
;; new section the content before the first headline is a section that does not belong
;; to any headline.
:preamble
{:section {:raw ...
:ast ...}}
;; a flat list of headlines with their associated sections
:headlines
[{:headline {:level 1
:title "hello world"
...}
:section
{:raw "this is the first section\nthis line has *bold text*\n"
:ast [[:text [:text-normal "this is the first section"]]
[:text
[:text-normal "this line has "]
[:text-styled
[:text-sty-bold [:text-inside-sty-normal "bold text"]]]]]}}
...]} |
branch14 and I just double checked this suggestion. It looks fine to me. As for the 'hierarchical' vs 'nested' structure of headlines: We think that having a flat list is easier to work with for the consumer. For those who need/want a nested structure, transforming from flat to nested is a simple reduce, so it shouldn't make a big difference to what org-parser actually provides. |
I agree. A flat list for headlines is fine. And having the same structure ( It's possible that, for some element transformations, we should keep the transformed "sub-ast" and a "sub-raw" form. Or maybe better a pair of indexes pointing to the position in the section raw string? Anyway, it might make sense in some cases, to allow re-export without discarding whitespace. |
In order to preserve whitespace we should either (a) include whitespace in the parsed text or (b) retain whitespace in the AST as we do with empty lines. (a) is how it is currently done. Example input: Example ast (a): Example ast (b): @schoettl Do you have examples for "discarding" whitespace? Passing raw for some elements is IMHO a convenience for consumers that cannot handle all elements, but it will be tricky to balance, as we cannot account for future use cases. |
If you search in EBNF for regex
I've go through it and I think that only the
Maybe the instaparse meta information about position/span can still be used in the resulting transformed structure? Then we don't need any additional raw values and still have can provide all original information. |
Hi @branch14 @munen ,
According to the worg spec, the (transformed) parse tree should look like this:
As for as I remember, this is different from the Organice parser:
I suggest that stick to the orgmode spec, i.e. allowing a section above the first headline and keeping a hierarchical structure of headlines.
document
would then be ourS
symbol.That could be implemented in the transformers in PR #27 .
It will be more work later, to implement org-parser in Organice, but we get a general orgmode parser :-)
The text was updated successfully, but these errors were encountered: