-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Add: first draft of parsing basic structure elements #7
Conversation
Looks like a great start! 👍🙏🏻 |
I haven't had a lot of time to work on this recently, but I was getting stuck on how to properly define the nested structure of an org file with instaparse. Should it be possible to describe this with EBNF? I was assuming it can be due to this part of the syntax specification:
But that may assume we just take the headlines and count stars, and perhaps process the data structure into a tree after parsing the document. I was having trouble figuring out how to determine if we're jumping up just one level when the next headline comes or if we're coming out multiple levels of nesting, for example:
My original attempt worked to go from one to two and back, but adding a third level subheading showed me it was naive. The parser just goes up one level. I don't want to invest too much time into it if it isn't going to work, so if anyone here has thoughts I'd love to hear them :) Thank you! |
Hi @gcentauri , http://xahlee.info/clojure/clojure_instaparse.html -> Function: transform Haven't tried it yet, but it seems like the map argument to this function is the place where we transform the very basic parsed structure to a higher-level structure. E.g. from
to
Currently I'm working on the timestamps PR. I probably need to transform them to a higher-level structure, too. |
@schoettl - thanks for the insight! i was getting that feeling too, i'm just very new to parsing stuff. so it seems like indeed the line-based approach might be the first pass, and then we have another pass to take the structure generated by that and turn it into the proper tree structure? i'd like to get back to this soon. i just felt stuck. |
I think your right, with that second transforming pass on the parse tree. I opened #8 to discuss in how far this transformation is in scope of this project. Anyway, I think the plain parsing to a flat list of headers is a very important first step! |
@gcentauri, @schoettl First of all let me thank your for you interest and work in this project. Your thoughts and efforts are greatly appreciated! In my first attempts I actually did follow the idea to identify semantic blocks rather than "only" lines. But it turned out to get tricky rather quickly. While I didn't encounter any formal reason not to continue with semantic blocks, I felt that it would make it really hard for others to contribute. Hence I decided to proceed with the much simpler line based approach (or as I put it in 4a4563f "the sane way"). Org-mode is a line based format where greater blocks and other semantic units are made up of lines after all. I expect following that observation for building a parser will keep things simple. As the parse tree that results from a line based approach does not yield the data structure that resembles the document nicely (i.e. is the structure one would like to work with) a 2nd step "transform" will be required, much like @schoettl pointed out. While Instaparse's Having said that and looking at the progress you made with #11, I'm totally open to other approaches. The PR reminds me of my attempts just before I gave up on the idea to have the grammar do the heavy lifting and decided to go with the simpler line based and a subsequent transformation. So I wonder where you're at. |
Here I layed out how the code for transformation could look like: #15 |
I think that it's good to combine both approaches: parsing of semantic blocks where possible, and line-based parsing where it gets messy with EBNF. I already wrote EBNF for On the other hand, if we do that stuff in the transformation step it's much more coding with conditionals, map/reduce, ... Similar to what is implemented in organice or other orgmode parser libraries. But I agree, using EBNF can get messy or impractical. One example are So I'd vote for putting as much "syntax comprehension" in the EBNF as long it can be expressed cleanly. The rest can be done in the transformation. |
I discussed this with branch14 we both agree. This is a sane and pragmatic approach. Let's continue like this. It's also very nice that there are good examples for both options now^^ |
I'd like to get back to this sometime :) been busy with the crazy year that has been 2020. But Lisp keeps coming back to me and org mode has always been a love of mine too. i'll keep watching the repo and see where i can help. it was probably a bit impetuous of me to think i could figure out how to do the top down parsing over the line-based approach already begun :) |
You triggered a very good discussion @gcentauri :) A lot have happened since that. I'll get back to #11 soon as I can. It probably makes sense to built on that one to prevent conflicts. |
Thank you for your contribution, @gcentauri! All the best to you and your family 🙏 |
Hey @gcentauri, I suggest we close this PR. A lot have changed since last year. We now have layed out a structure for the parse result (#31). I've also implemented parsing some block-like elements as semantic units (instead of line-based parsing). This semantic parsing has to be enabled step-by-step (#32). I'll start with that after other open PRs are merged. |
🙏 |
I realized that the existing
org.ebnf
file is just aiming at parsing individual lines. I'm sure this will be useful, but as I started looking at property drawers and decided to try working top-down based on the org specification.This is just a start, but is one step on the way of building the tree structure. Currently, it does not properly switch heading levels while reading the tree. I kept everything in separate files for the time being, as it was easier for me to deal with while learning how everything works.