Replies: 5 comments 1 reply
-
CSTs keep all the information about the original source, including whitespace. With a CST, it is possible to round-trip like An example would be the Python
Notice the irrelevant information of the extra space after the |
Beta Was this translation helpful? Give feedback.
-
Can you show example code of what this looks like? I think I may have mentioned this in an earlier conversation, but for sphinx-math-dollar, we extend RST to allow using dollar signs for math (like $\sin(x)$). But there is a problem with absolute values, because docutils preprocesses |
Beta Was this translation helpful? Give feedback.
-
Yes, I am aware, and I also contributed fixes to numpydoc. I'm actually currently using 1/2 numpydoc and 1/2 custom parsing. Aaron pointed out what a CST is, and yes, mostly I want to be able to get back to original source for two reasons:
Personally I would also appreciate to have the AST/CST that can be exported in a reliable way to JSON for potential processing in another language, or picked up by a different process later on – which should not be a problem, as long as we don't rely on shared instance later which I can see happening with replacement and references. |
Beta Was this translation helpful? Give feedback.
-
In that case you'd need something even more sophisticated than a CST. You need something that can handle incomplete/incorrect syntax. Usually you can only handle this sort of thing by looking at the token stream, which is often too low level to work with effectively. Perhaps something similar to parso could be built that can process "error nodes" without breaking the rest of the parsing. |
Beta Was this translation helpful? Give feedback.
-
Well, RST tend to be relatively forgiving, and so far I've found that most of the trees I get are correct except the wrong nodes are detected due to issues with spacing/missing underscore in the right places. Worse case if I have a CST I can unparse just a section and run heuristics on it specifically. |
Beta Was this translation helpful? Give feedback.
-
In response to #1 (comment) by @Carreau:
I assume you are aware of napoleon which can parse NumPy docstrings?
I'm not familiar with the differences between CSTs and ASTs (note to self: read Abstract vs. Concrete Syntax Trees ), but I recently also ran into the need to parse reStructuredText as-is.
docutils performs some transformations during parsing on the syntax tree. Some of these transformations can be disabled, but others are hard-coded. This makes it impossible to reconstruct the original reStructuredText for some elements (e.g. the role and contents directives). So far, I have been able to work around this by monkey-patching docutils.
It would of course be much better if docutils would first parse the reStructuredText into a tree that represents the source exactly. This would be a good candidate for cooperating on.
Beta Was this translation helpful? Give feedback.
All reactions