-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keywords and tokens line attributes #312
Comments
I have a few comments.
|
I believe PORT1 : in std_logic; -- Comment IN port list ...the line = 'PORT1 : in std_logic; -- Comment IN port list'
lineNoComment = 'PORT1 : in std_logic; --'
tokens = ['PORT1', ' ', ':', ' ', 'in', ' ', 'std_logic', ';', ' ', '-- Comment IN port list'] Well...this was supposed to be an example of why we would need to return a list without comments. I do not think we need it at all. In this example, we would search all the tokens for anything that matches if 'in' in oLine.tokens: This would exclude the comment because it will never equal As for the if 'in' in oLine.tokens:
if 'In' in oLine.tokens:
if 'IN' in oLine.tokens:
if 'iN' in oLine.tokens: Typically a case check is two steps:
Do we have a lower case token attribute? line = 'PORT1 : in std_logic; -- Comment IN port list'
lineNoComment = 'PORT1 : in std_logic; --'
tokens = ['PORT1', ' ', ':', ' ', 'in', ' ', 'std_logic', ';', ' ', '-- Comment IN port list']
tokensLower = ['port1', ' ', ':', ' ', 'in', ' ', 'std_logic', ';', ' ', '-- comment in port list'] ...or is there an easy way to lowercase as we check: for iIndex, sToken in enumerate([x.lower() for x in oLine.tokens]):
if sToken == 'in': # Step 1
if not oLine.tokens[iIndex] == 'in': # Step 2 Maybe there is an easier way to handle this. |
As for using |
Do you have any thoughts on how to handle spacing/indenting without keeping them as tokens? I can't think of anything at the moment. |
Exactly, so lets make 2 functions. We can have We can stay with |
It makes sense to use the Scenario: word exists in line Scenario: word does not exist in line I like your idea to have the spaces separate. I was also thinking comments are not tokens either. Maybe the comment could be in it's own attribute. So the follow line: PORT1 : in std_logic; -- Comment IN port list would resolve to these attributes: separators =['', ' ', ' ', ' ', '']
tokens = ['PORT1',':','in','std_logic',';']
tokensLower = ['port1',':','in','std_logic',';']
comment = [' ', '-- Comment IN port list'] We could then interleave the separators and tokens list and then extend the comment list. Finally we use join to create a string. Maybe move the spaces before the comment to the separators list? separators =['', ' ', ' ', ' ', '', ' ']
tokens = ['PORT1',':','in','std_logic',';']
tokensLower = ['port1',':','in','std_logic',';']
comment = '-- Comment IN port list' Then we append the comment to the tokens list, interleave it with the separators, and use join to create a string. With either method, we do not need It feels like switching to using lists would make writing rules easier than using regular expressions. For example, the rule
If there is a violation, then the fix would follow these steps:
Even if it is not easier, I would hope the code would be more robust and easier to read. Functions like |
According to the standard the comment is a token. For example this is a single token for i, token in enumerate(tokens[:-1]): Hmm, I really do not like breaking consistency, but if there is a significant gain from putting the comment into separate attribute I can accept this. I can't think of any gain right now. At first I thought |
Let's try to keep consistency and see if we want to change it later. I will be pushing some code that creates the token and separator lists. |
Adding code to support tokenizing the line: 1) Added token attribute to the line object 2) Added token.py file 3) Adding tests Currently the tokenizer handles consecutive spaces, comments and characters. NOTE: Future commits will handle VHDL symbols.
Fixed issue if comment was at the beginning of the line.
Tokenizer now handles commas.
Added open and close parenthesis tokens.
Added tokens for single and double quotes.
Refactored single character symbol parsing. Added + to tokens.
Added all operators and assignment symbols.
Added semicolon to single character symbols. Added test with multiple symbols.
Added another multisymbol test.
Added separators attribute to line. 1) Updated tests to check both tokens and seperators lists
Initial attempt at a token and separator list has been pushed. I'm sure it still needs some work. |
Can you undo the merge to the master, make pull request and add me as a reviewer? |
The low-level Tokenizer (pass 1) is working without known errors in pyVHDLParser. In a next processing step (pass 2), tokens are rewritten to specific tokens like keywords, identifiers and assembled to blocks. I would rate this step as 75% done. I'll try to fix the issue reported in Paebbels/pyVHDLParser#6 |
Hi @jeremiah-c-leary @m-kru @nfrancque! Please, let me step in. I'm not a lead developer/maintainer of those projects, but I'm the second contributor to ghdl/ghdl (in volume) and I have write access to VUnit/vunit. Moreover, I've been quite inquisitive with all three main developers (Tristan, Patrick and Kraigher). Apart from private conversations, I tried to get them to publicly discuss the similarities/differences between those projects. Overall, you all seem to be very well informed, but I'd like to add some bits of info, should it be useful for you.
This not really true... Kraigher decided to use Rust because he considered that it was/is the best language for the task when he decided to start the project from scratch. Actually, he did so because GHDL was/is written in Ada. If GHDL was written in a different language, or if GHDL internals had been better documented at that time, Kraigher would probably have not started rust_hdl. Note that these assertions above are my own opinion. I got to think that after a discussion we had in GHDL's gitter channel some months ago. The point is that rust_hdl does not include semantic analysis yet, but GHDL does. The conversation between Kraigher and Tristan was about how rust_hdl could be a frontend (parser) to be plugged into GHDL. Or, conversely, GHDL be a backend for rust_hdl. They even talked about rewriting GHDL in Rust... Should you be curious: October 17, 2019 9:34 PM (it went until October 20). This is to say that use cases of rust_hdl are limited because it is a compiled language, but it is not "for Rust". Hence, that limitations are the same (or less) than using GHDL. libghdl-py depends on libghdl, which depends on a working installation of GHDL (it does not require the binary itself, but many assets). Therefore, it might be easier (from a distribution and usage point of view) to get a suitable JSON output from rust_hdl than using libghdl-py. A few days ago, Tristan pushed some info about the internals of GHDL to the docs, and Rust is mentioned there: https://ghdl.readthedocs.io/en/latest/internals/AST.html#dealing-with-ownership. At the same time, the development of rust_hdl has been quite active lately. Hence, we might expect some integration between both tools in the following months. This would be interesting as a reference, since it would be the first example of a tight integration with GHDL (being the rust scripts/bindings roughtly equivalent to libghdl-py).
It is, but the required "technical infrastructure" is the most similar to this use case. Both projects:
Hence, the discussions about the usability of pyVHDLParser, or how to deal with the complexity of requiring a compiled tool, are useful for both projects.
Using a parser engine such as ANTL4 is considered to be a dead end for complete support of recent VHDL versions. rust_hdl, pyVHDLParser and GHDL, all use custom parsers. Precisely, pyVHDLParser and GHDL are roughly similar. I wonder how does hdlConvertor deal with tricky sources.
The main stopper for using pyVHDLParser would be that Patrick was very busy last year and the project was on hold. However, once brought to a usable status, I agree that it would be the best option.
I'm the one that discussed with Tristan how to reorganise Python packages and the one who edited those readmes accordingly. Let me know if I can help you. |
Thank you @eine for the excellent writeup! @kraigher and @LarsAsplund may want to weigh in as well. I think that the needs of at least vunit and VSG are very similar here and some collaboration between the two would be really useful. A unified parser library for all open source VHDL projects would be the holy grail, and it sounds like you're saying rust_hdl is the most mature option so far. Maybe one question to ask, do you guys know if any of these options throw away things like whitespace information and other niceties before VSG could pull it in? It is a little different from vunit in that respect. Would be great if everyone could decide on something and get that to a good working state. |
You are welcome! It's always a pleassure to share knowledge in this very small community we have ;)
Not so... rust_hdl is the most mature option in the sense that Kraigher focused on parsing only, and he brought it to a stable status. Then, he focused on the language server, and he brought it to a stable status. However, further features are to be implemented yet. Conversely, both GHDL and pyVHDLParser do include parts for many more features, but those are kind of "permanently unstable". They are stable, in practice, but there is no such an strong commitment to not breaking backwards compatibility as Kraigher does. At the same time, I know that Patrick and Tristan had discussions about "parser architectures". Recent documentation enhancements in both projects show that they are using very similar approaches, but with different languages. Honestly, I don't know how similar/different is Kraigher's approach. Hence, from my ignorance, I feel that it might be easier in the future to make pyVHDLParser and GHDL kind of complementary or interchangeable in terms of the internal AST. However, both Kraigher and Tristan seem independently committed to doing something. Ideally, we would have a unified AST, and multiple bindings (Ada, Python, Rust). This reminds me of f4pga/ideas#19.
I forgot to make it explicit before, but this is the main reason you want/need to use pyVHDLParser for this project. The main motivation for Patrick has always been automatic documentation generation for VHDL sources/projects. Thus, preserving comments and whitespaces (not touching the sources) has always been his primary focus. Conversely, GHDL documentation of the internals says explicitly that comments and such are ignored. It is also said that it would not be difficult to keep them, but it is not a priority for GHDL. Once again, I don't know this detail about rust_hdl.
Given the constraints, I think that all three of them are doing a good job at trying to collaborate. However, they develop these projects in their leisure time; hence, many implementation details are not clear until they actually get to work on them. It's not like they can have a plan to work on several days consecutive. |
That would be nifty. As a side note - with Symbiflow I believe we've referenced every FPGA-related open source project that exists in this thread now :)
Glad I asked about the whitespace, so most likely the only option here would be PyVHDLParser? Unless Kraigher tells us differently.
Yes, I did not mean to understate the awesome work everyone has done! All of these projects are pretty incredible. The fact that Symbiflow actually works blows my mind. |
Hey @eine, @Paebbels, @nfrancque and @m-kru, This is a great discussion and improved my understanding of the pros and cons of each option with respect to VSGs goals. Given everything that was discussed I agree with @m-kru and @eine that pyVHDLParser is worth investigating before proceeding with our own tokenizer. I can see using the output of the first phase, but maybe not the second phase. I would like to thank everyone for their input. It was very enlightening. |
I saw the Rust HDL was mentioned in this thread so let me share my goals of the project as well as its main advantages. Rust HDL is an umbrella project with the goal of provide modular software components blocks for creating HDL tools. There are currently two sub-projects:
I have focused on the language server as the initial sub-project driving the development of Some examples of how lossless parsing has shaped
As I write in my project readme using Rust as the implementation language for these tools is the ideal choice because of:
I do not consider C/C++, Python or ADA as living up to all of the above advantages as Rust does. Deciding to write it in Rust has been a very good choice that I am very satisfied with. The state of vhdl_lang / vhdl_ls is progressing well where parsing is very complete and semantic analysis well under way. I expect it to do full semantic analysis of VHDL before the summer. It is already good enough to support goto-declaration and find-all-references for non-overloaded names. Thus you can find all references of design units, signals, types constants etc as well as goto their declaration. It supports multi-threaded parsing and analysis and can parse and analyze a project with 444 files and 190000 lines of code in 212 ms on my quad core laptop. Since I support incremental analysis when editing code in the editor with the language server as a backend it will run in sub 10 ms for every keystroke even in huge projects giving instant feedback as you type. The user does not need to specify any analysis order as it is determined automatically as well as detecting circular dependencies. This also a requirement for language server applications. It could emit a dependency tree for use by other tools today I a just added the printing of it. It is very well tested at different levels which combined with the static typing makes the code base a lot easier to modify for other contributors and thus I have got several third party contributions even tough I have not documented the internals anything yet. As I write in my project readme the goal is to support other tools built on top of |
After looking more into what your project aims to achive it seems very aligned with rust hdl goals. You will need full parsing to not reach a dead end which vhdl_lang already does for all of vhdl 2008. Also Python is really not the best language to inspect and pattern match complex data structures as there is no static typing unless you use mypy and not even a case statement. There wil be a lot of bugs when you refactor without strict use of pylint mypy for code analysis applications where the data structure is nested and complex. Performance will also be a problem with Python where it will be very sluggish for larger code bases. Most of what you have today could probably be written by traversing the ast that vhdl_lang creates and then you can just focus on adding value with the rules and not have to reinvent vhdl parsing which is complex. The goal of rust is to enable more tools by sharing the important components. |
I have read the discussion on gitter, all the comments here, some other issue and it is obvious that we all share the same goal to make VHDL development easier and cheaper. Despite the fact, that all of us would like to have single official unofficial VHDL parser, it looks like we will still work separately. For example take the objective to check formatting style of I do not believe, that rewriting GHDL from language Of course first, and probably the biggest, issue is the human factor. Some tools would need to be abandoned and some rewritten. People somehow find it hard to accept, that community chooses tool with functionality |
While I understand your frustration, that's the beauty and the pain of non-profit, free and open source software... Although style/formatting is mentioned in all four projects, only one of them is focused on it. GHDL is a simulator (soon a synthesiser). pyVHDLParser is the foundation of a fancy documentation generator. rust_hdl is a toolbox. vhdl-style-guide is the only one which is primarily meant for it. Should we be talking about commercial products, or projects backed by companies (such as VSCode), the request to support style/formatting would probably have been ignored, put in the Backlog or just considered of lowest priority in three of those projects. Instead, because those are open source and open projects, we get "this is technically possible, would love to have it some day". In the end of the day, time is money, and the time that a handful of people can donate is very limited.
I believe that the language server, pretty-printing, etc. in GHDL are more of toy/concept projects than actual features, for now. This is true for any non-simulation non-synthesis feature, and even for code coverage. Actually, I am convinced that the reason for all those conceptual implementations to exist is that, for a decade, there was no VHDL parser that could be compared to GHDL. Hence, Tristan tried to provide hints about how it could be used for purposes complementary to simulation, which he received many questions about but which he could not implement due to time constraints. A language server is not listed as a use case of pyVHDLParser. Although some features are similar, pyVHDLParser seems not to be focused on providing features such as snippets/templates, completion. Furthermore, because of performance reasons, this Python based implementation is better suited for one-time executions (say CI jobs), rather than tight loops. Precisely, this is the niche (easy installation, long pipelines) where pyVHDLParser fits (and it is not redundant), compared to other compiled tools. Therefore, I'd say that there is a single stable and actively maintained language server ATM, Kraigher's. The only reason to use GHDL's LS now is that it's already there for all of us that use GHDL and VUnit (or any other python tool). Since libghdl is built by default, enabling the language server is a matter of installing/copying two tiny python packages. Related to language servers, we must account for clients too. Currently, there is an VSCode extension for Kraigher's project and another one for GHDL. As you say, I thought it was nonsense to have two of them, so I commented it a couple of times. IIRC, @GlenNicholls tried to achieve it, and he spent several days analysing how to merge both extensions, so that a single one would work with any of the servers. In the end, he gave up.
I do think that it should be stated as "migrating from an old-fashioned environment to a moderm framework". A pain with GHDL is that there is no standard tool/procedure for documentation generation/extraction or code architecture analysis. The software architecture of any of the tools we are talking about is very complex. Not being able to grasp a rough idea of the big picture without diving into the sources is a serious stopper. Each time I try to implement something in GHDL, I feel blind, stupid and frustrated. I think it is crazy to rewrite GHDL to Rust, and I was astonished when I saw Tristan comment it. However, should it be done, I do believe it would be positive and it would benefit contributions in the mid term. At least, it would be foundation for a merge with rust_hdl.
Well, there is FOSSi already: https://fossi-foundation.org. However, if you felt that there was duplication, the list of projects can hurt: https://fossi-foundation.org/gsoc19-ideas
As said, I think we should focus on the features/resources that are not implemented yet and which can be shared in the future; rather than wishing some utopia. One of such features is an AST format the parsers in different languages can share. This would allow parsing/tokenization to be decoupled from other passes. There is also a lot of room for improvement in the level just above these tools: project/source management. To name a few tools: VUnit, rust_hdl, GHDL, ghdl-language-server, edalize, fusesoc, tsfpga, duh, hdl-component-manager, Precision, Vivado... All of those require users to provide a list of sources mapped to a set of libraries. Yet, there is no standard JSON/YAML/TOML format to feed all of them. Precisely, I talked about For me, as a user, it is positive that Kraigher, Tristan and Patrick develop three projects with a few overlapped features; as long as it is not painful to switch between them. I'd use GHDL for simulation, rust_hdl for LS (server and client) and pyVHDLParser for sphinx integration. The pain now is to learn/configure eveything three times. |
Yeah, I once wanted to extend some PSL support. After 1 day of code analysis I gave up.
It sounds crazy, however the longer I think about it the more benefits I see. There is no doubt, that GHDL is one man show. If there were more people involved in the rewrite process than @kraigher and @tgingold the knowledge would get spread. Maybe it would be nice to give this idea at least a proof of concept. For example find 3-4 volunteers that would try to rewrite some part of GHDL to check how fast the work would go. |
so after a long time thinking about tokens and parsing VHDL, I think I have come up with a partial solution. In VSG, the parsing and the rule analysis are tightly coupled. While this worked initially, I think I have come to the point where I need to abstract the parsing from the rules. With the goal being that I could replace the current parsing code with another parser and then use the abstraction layer to provide the information a rule needs to operate. This will require defining an API to the parser. For example, for rules that deal with entities I could request all the entities in the file: entities = parser.get_entities() Then if I wanted the entity identifier: entities = parser.get_entities()
for entity in entities:
identifiers = entity.get_identifier() I think this will give me the most flexibility to change parsers without having to change the rules. Unfortunately, this will require me to re-write all my rules. I think long term it will be a better solution. I plan to eventually write new rules in this method and start to re-write old ones as time goes on. |
That makes sense. Would you like to use this issue to discuss what that API would look like? Also unsure how this would work for new rules without breaking old ones. |
Checklist_ruleset、DO-254_ruleset、RMM_rulesetSafety-Critical_ruleset、Verification_ruleset、Essentials_ruleset、RTLC_ruleset |
@cuijialang, you triggered three notifications for @jeremiah-c-leary in less than 10h, regarding the same issue. Please, be polite and patient. The feature you are requesting is unlikely to be feasible in the short term. |
Hey @cuijialang , |
Here is an update, I evaluated pyVHDLParser, but found it did not provide what I really wanted from a parser. I could have taken the output of pyVHDLParser and turned it into what I wanted, but at that point it seemed a better alternative to just write my own. So after two false starts I finally have some semblance of a parser. I used the 2008 LRM and based all my tokens on the production name. I am currently re-writing my 443 rules to use the new parser. I tried to keep both the old method and the new parser working at the same time, but at some point I broke the old one. Oh well, I was going to toss it anyway when I am done re-writing the rules. I will probably close this issue in about a week. Regards, --Jeremy |
There are several line attributes dealing with the contents of the line:
line # Current line
lineLower # Current line lower cased
lineNoComment # Current line with comments removed
There are several
utils
functions that parse through the line searching for words and removing words.A lot of these functions are using regular expressions to do the searching and replacing. I am proposing adding a
token
attribute.This attribute would contain a list of tokens in the line, and would also include spaces. For example, the line:
would have the following attribute values:
In fact, the
tokens
attribute could be the "base" attribute that all the rules reference. I would think changes would be easier to make since everything becomes a list operation instead of regular expressions. If we need to remove a word, we just find it and remove it from the list. Dealing with spaces could be a little tricky, but manageable.Checks could be easier also. For example, to check if a single space exists before a parenthesis we just get the indexes of all the open parenthesis and then check the index before it contains only a single space.
The original line can be reconstructed by concatenating the list together.
This concept is similar to a Lexer, but in this case we keep the spaces. Creating the tokenizer should not be that hard. We could just work the line string from left to right and collect tokens. All consecutive spaces are placed in one token. Consecutive letters, numbers and underscores are a token, each parenthesis is a token, commas are tokens, semicolons are tokens, math functions are tokens, '<=' is a token.
We could also keep a list of keywords, but I am not sure if that would be as useful.
The text was updated successfully, but these errors were encountered: