Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add keywords and tokens line attributes #312

Closed
jeremiah-c-leary opened this issue Dec 30, 2019 · 35 comments
Closed

Add keywords and tokens line attributes #312

jeremiah-c-leary opened this issue Dec 30, 2019 · 35 comments
Assignees

Comments

@jeremiah-c-leary
Copy link
Owner

There are several line attributes dealing with the contents of the line:

line # Current line
lineLower # Current line lower cased
lineNoComment # Current line with comments removed

There are several utils functions that parse through the line searching for words and removing words.

  • extract functions
  • get_word
  • get_first_word
  • change word
  • clear_keyword_from_line
  • search_for_and_remove_keyword

A lot of these functions are using regular expressions to do the searching and replacing. I am proposing adding a token attribute.

This attribute would contain a list of tokens in the line, and would also include spaces. For example, the line:

    a <= b and (c or d); -- Do this thing

would have the following attribute values:

  line = 'a <= b and (c or d); -- Do this thing'
  lineLower = ' a <= b and (c or d); -- do this thing'
  lineNoComment = 'a <= b and (c or d); --'
  tokens = ['a', ' ', '<=', ' ', 'b', ' ', 'and', ' ', '(', 'c', ' ', 'or', 'd', ')', ';' ,' ', '-- Do this thing']

In fact, the tokens attribute could be the "base" attribute that all the rules reference. I would think changes would be easier to make since everything becomes a list operation instead of regular expressions. If we need to remove a word, we just find it and remove it from the list. Dealing with spaces could be a little tricky, but manageable.

Checks could be easier also. For example, to check if a single space exists before a parenthesis we just get the indexes of all the open parenthesis and then check the index before it contains only a single space.

The original line can be reconstructed by concatenating the list together.

This concept is similar to a Lexer, but in this case we keep the spaces. Creating the tokenizer should not be that hard. We could just work the line string from left to right and collect tokens. All consecutive spaces are placed in one token. Consecutive letters, numbers and underscores are a token, each parenthesis is a token, commas are tokens, semicolons are tokens, math functions are tokens, '<=' is a token.

We could also keep a list of keywords, but I am not sure if that would be as useful.

@m-kru
Copy link
Collaborator

m-kru commented Jan 3, 2020

I have a few comments.

  1. I think we should have as few attributes as possible. When we have tokens list, I do not see any point in having lineLower and lineNoComment. Maybe now they should stay, but all their usage can probably be replaced with tokens.

  2. I have read the VHDL standard today and:

  • In the VHDL standard there is not even a single occurence of token word. They use the term lexical elements instead of tokens. Maybe it would be better to use the same nomenclature as the standard uses.

  • spaces (' ') are not a part of tokens/lexical elements. I am sure it is the same in the C language standard. In my opinion they should not be elements of the tokens/lexical elements list. The question is how to handle whitespace, alignment, indent rules.

@jeremiah-c-leary
Copy link
Owner Author

I believe lineNoComment could be easily replaced by the tokens attribute. The rules that do not deal with comments could request a token list minus comments. For example if we wanted to control the case of a port direction in the following code...

  PORT1 : in std_logic;  -- Comment IN port list

...the in keyword is present in the comment also. When using regular expressions we have to remove the comment so we ensure we ignore anything in the comments. The line object would have the following attributes:

   line = 'PORT1 : in std_logic;  -- Comment IN port list'
   lineNoComment = 'PORT1 : in std_logic;  --'
   tokens = ['PORT1', ' ', ':', ' ', 'in', ' ', 'std_logic', ';', '  ', '-- Comment IN port list']

Well...this was supposed to be an example of why we would need to return a list without comments. I do not think we need it at all. In this example, we would search all the tokens for anything that matches in.

  if 'in' in oLine.tokens:

This would exclude the comment because it will never equal in.

As for the lineLower, how would we handle the above example as there are four case possibilities for in:

  if 'in' in oLine.tokens:
  if 'In' in oLine.tokens:
  if 'IN' in oLine.tokens:
  if 'iN' in oLine.tokens:

Typically a case check is two steps:

  1. Find the keyword
  2. Check it's case

Do we have a lower case token attribute?

   line = 'PORT1 : in std_logic;  -- Comment IN port list'
   lineNoComment = 'PORT1 : in std_logic;  --'
   tokens = ['PORT1', ' ', ':', ' ', 'in', ' ', 'std_logic', ';', '  ', '-- Comment IN port list']
   tokensLower = ['port1', ' ', ':', ' ', 'in', ' ', 'std_logic', ';', '  ', '-- comment in port list']

...or is there an easy way to lowercase as we check:

    for iIndex, sToken in enumerate([x.lower() for x in oLine.tokens]):
        if sToken == 'in':   # Step 1
            if not oLine.tokens[iIndex] == 'in':   # Step 2

Maybe there is an easier way to handle this.

@jeremiah-c-leary
Copy link
Owner Author

As for using tokens versus lexical elements I would prefer tokens. I have seen the term used when I have read about lexers and parsers.

@jeremiah-c-leary
Copy link
Owner Author

Do you have any thoughts on how to handle spacing/indenting without keeping them as tokens? I can't think of anything at the moment.

@m-kru
Copy link
Collaborator

m-kru commented Jan 3, 2020

Typically a case check is two steps:

Find the keyword
Check it's case

Exactly, so lets make 2 functions. We can have tokensLower attribute, but maybe lets do lazy evaluation for it. You want to get in keyword from the line? Ok, use extract_word function. On the first call for given line this function could evaluate tokensLower attribute if it is None. We can save some time (I am not sure if user will feel the difference, maybe with a lot of files and lines).

We can stay with tokens instead of lexical elements, I do not care. However I care about putting spaces into tokens list as spaces are not tokens.
My idea. Spaces are called separators in the standard. We could have 2 attributes tokens and separators. These 2 lists would interleave in the end, when we want to produce output line. In case of no separator between tokens we would insert empty string '' in the separators list. Thanks to this we would not iterate over spaces in most rules.

@jeremiah-c-leary
Copy link
Owner Author

It makes sense to use the extract_word function. Would it return the index the line is located at, or the token in the token list. For example:

Scenario: word exists in line
1. Search tokensLower list for word
2. Store index of word
3. Return item located at index in tokens list

Scenario: word does not exist in line
1. Search tokensLower list for word
2. Return None

I like your idea to have the spaces separate. I was also thinking comments are not tokens either. Maybe the comment could be in it's own attribute. So the follow line:

PORT1 : in std_logic;  -- Comment IN port list

would resolve to these attributes:

separators =['', ' ', ' ', ' ', '']
tokens = ['PORT1',':','in','std_logic',';']
tokensLower = ['port1',':','in','std_logic',';']
comment = ['  ', '-- Comment IN port list']

We could then interleave the separators and tokens list and then extend the comment list. Finally we use join to create a string.

Maybe move the spaces before the comment to the separators list?

separators =['', ' ', ' ', ' ', '', '  ']
tokens = ['PORT1',':','in','std_logic',';']
tokensLower = ['port1',':','in','std_logic',';']
comment = '-- Comment IN port list'

Then we append the comment to the tokens list, interleave it with the separators, and use join to create a string.

With either method, we do not need line, lineLower, or lineNoComment attributes.

It feels like switching to using lists would make writing rules easier than using regular expressions. For example, the rule port_005 checks if there is a single space after the :. The check would follow these steps:

  1. get index of : in tokens list
  2. check the number of spaces in index + 1 in separators list

If there is a violation, then the fix would follow these steps:

  1. get the index of : in tokens list
  2. change the element at index + 1 in the separators list to ' '

Even if it is not easier, I would hope the code would be more robust and easier to read. Functions like vsg.utils.change_word and vsg.check.are_there_spaces_after would simplify.

@m-kru
Copy link
Collaborator

m-kru commented Jan 4, 2020

According to the standard the comment is a token. For example this is a single token -- My comment in vhdl file. I my opinion keeping comment in a tokens list is more consistent, as it is a token. If there is no comment on the line the last element of the tokens list would be empty string ''. However iterating would be a bit easier if there were no comments at the end. On the other hand it is quite simple to iterate the list without last element:

for i, token in enumerate(tokens[:-1]):

Hmm, I really do not like breaking consistency, but if there is a significant gain from putting the comment into separate attribute I can accept this. I can't think of any gain right now.

At first I thought extract_word should return word or list of words, however working on indexes opens new possibilities. We should think if there is any scenario where we want to return more than single index. If there is I think all functions should return list of indexes, even if more functions always return single index, to keep their API consistent.

@jeremiah-c-leary
Copy link
Owner Author

Let's try to keep consistency and see if we want to change it later. I will be pushing some code that creates the token and separator lists.

jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Adding code to support tokenizing the line:

  1)  Added token attribute to the line object
  2)  Added token.py file
  3)  Adding tests

Currently the tokenizer handles consecutive spaces, comments and characters.

NOTE:  Future commits will handle VHDL symbols.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Fixed issue if comment was at the beginning of the line.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Tokenizer now handles commas.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Added open and close parenthesis tokens.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Added tokens for single and double quotes.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Refactored single character symbol parsing.
Added + to tokens.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Added all operators and assignment symbols.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Added semicolon to single character symbols.
Added test with multiple symbols.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Added another multisymbol test.
jeremiah-c-leary added a commit that referenced this issue Jan 4, 2020
Added separators attribute to line.

  1)  Updated tests to check both tokens and seperators lists
@jeremiah-c-leary
Copy link
Owner Author

Initial attempt at a token and separator list has been pushed. I'm sure it still needs some work.

@m-kru
Copy link
Collaborator

m-kru commented Jan 5, 2020

Can you undo the merge to the master, make pull request and add me as a reviewer?

jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
jeremiah-c-leary added a commit that referenced this issue Jan 7, 2020
@Paebbels
Copy link

Paebbels commented Jan 8, 2020

You are right, I did not grasp, that the core module is libghdl.

So, we have 3 possibilities:

  1. Our custom tokenizing.
  2. libghdl from ghdl.
  3. pyVHDLParser.

I think we should not hurry and get to know, at least a little bit, with libghdl from ghdl and pyVHDLParser.

The low-level Tokenizer (pass 1) is working without known errors in pyVHDLParser.

In a next processing step (pass 2), tokens are rewritten to specific tokens like keywords, identifiers and assembled to blocks. I would rate this step as 75% done.

I'll try to fix the issue reported in Paebbels/pyVHDLParser#6

@eine
Copy link

eine commented Jan 9, 2020

Hi @jeremiah-c-leary @m-kru @nfrancque! Please, let me step in. I'm not a lead developer/maintainer of those projects, but I'm the second contributor to ghdl/ghdl (in volume) and I have write access to VUnit/vunit. Moreover, I've been quite inquisitive with all three main developers (Tristan, Patrick and Kraigher). Apart from private conversations, I tried to get them to publicly discuss the similarities/differences between those projects. Overall, you all seem to be very well informed, but I'd like to add some bits of info, should it be useful for you.

rust_hdl is for Rust

This not really true... Kraigher decided to use Rust because he considered that it was/is the best language for the task when he decided to start the project from scratch. Actually, he did so because GHDL was/is written in Ada. If GHDL was written in a different language, or if GHDL internals had been better documented at that time, Kraigher would probably have not started rust_hdl.

Note that these assertions above are my own opinion. I got to think that after a discussion we had in GHDL's gitter channel some months ago. The point is that rust_hdl does not include semantic analysis yet, but GHDL does. The conversation between Kraigher and Tristan was about how rust_hdl could be a frontend (parser) to be plugged into GHDL. Or, conversely, GHDL be a backend for rust_hdl. They even talked about rewriting GHDL in Rust... Should you be curious: October 17, 2019 9:34 PM (it went until October 20).

This is to say that use cases of rust_hdl are limited because it is a compiled language, but it is not "for Rust". Hence, that limitations are the same (or less) than using GHDL. libghdl-py depends on libghdl, which depends on a working installation of GHDL (it does not require the binary itself, but many assets). Therefore, it might be easier (from a distribution and usage point of view) to get a suitable JSON output from rust_hdl than using libghdl-py.

A few days ago, Tristan pushed some info about the internals of GHDL to the docs, and Rust is mentioned there: https://ghdl.readthedocs.io/en/latest/internals/AST.html#dealing-with-ownership. At the same time, the development of rust_hdl has been quite active lately. Hence, we might expect some integration between both tools in the following months. This would be interesting as a reference, since it would be the first example of a tight integration with GHDL (being the rust scripts/bindings roughtly equivalent to libghdl-py).

vunit is a tool with different goal in mind

It is, but the required "technical infrastructure" is the most similar to this use case. Both projects:

Hence, the discussions about the usability of pyVHDLParser, or how to deal with the complexity of requiring a compiled tool, are useful for both projects.

There is also hdlCovertor, which is a big tool and much more than what we need.

Using a parser engine such as ANTL4 is considered to be a dead end for complete support of recent VHDL versions. rust_hdl, pyVHDLParser and GHDL, all use custom parsers. Precisely, pyVHDLParser and GHDL are roughly similar. I wonder how does hdlConvertor deal with tricky sources.

I think we should seriously consider pyVHDLParser.

The main stopper for using pyVHDLParser would be that Patrick was very busy last year and the project was on hold. However, once brought to a usable status, I agree that it would be the best option.

See setup.py depending on libghdl from PyPi and apparent guide for doing it.

I'm the one that discussed with Tristan how to reorganise Python packages and the one who edited those readmes accordingly. Let me know if I can help you.

@nfrancque
Copy link
Collaborator

Thank you @eine for the excellent writeup! @kraigher and @LarsAsplund may want to weigh in as well.

I think that the needs of at least vunit and VSG are very similar here and some collaboration between the two would be really useful. A unified parser library for all open source VHDL projects would be the holy grail, and it sounds like you're saying rust_hdl is the most mature option so far.

Maybe one question to ask, do you guys know if any of these options throw away things like whitespace information and other niceties before VSG could pull it in? It is a little different from vunit in that respect.

Would be great if everyone could decide on something and get that to a good working state.

@eine
Copy link

eine commented Jan 9, 2020

Thank you @eine for the excellent writeup!

You are welcome! It's always a pleassure to share knowledge in this very small community we have ;)

A unified parser library for all open source VHDL projects would be the holy grail, and it sounds like you're saying rust_hdl is the most mature option so far.

Not so... rust_hdl is the most mature option in the sense that Kraigher focused on parsing only, and he brought it to a stable status. Then, he focused on the language server, and he brought it to a stable status. However, further features are to be implemented yet. Conversely, both GHDL and pyVHDLParser do include parts for many more features, but those are kind of "permanently unstable". They are stable, in practice, but there is no such an strong commitment to not breaking backwards compatibility as Kraigher does.

At the same time, I know that Patrick and Tristan had discussions about "parser architectures". Recent documentation enhancements in both projects show that they are using very similar approaches, but with different languages. Honestly, I don't know how similar/different is Kraigher's approach. Hence, from my ignorance, I feel that it might be easier in the future to make pyVHDLParser and GHDL kind of complementary or interchangeable in terms of the internal AST. However, both Kraigher and Tristan seem independently committed to doing something.

Ideally, we would have a unified AST, and multiple bindings (Ada, Python, Rust). This reminds me of f4pga/ideas#19.

Maybe one question to ask, do you guys know if any of these options throw away things like whitespace information and other niceties before VSG could pull it in? It is a little different from vunit in that respect.

I forgot to make it explicit before, but this is the main reason you want/need to use pyVHDLParser for this project. The main motivation for Patrick has always been automatic documentation generation for VHDL sources/projects. Thus, preserving comments and whitespaces (not touching the sources) has always been his primary focus. Conversely, GHDL documentation of the internals says explicitly that comments and such are ignored. It is also said that it would not be difficult to keep them, but it is not a priority for GHDL. Once again, I don't know this detail about rust_hdl.

Would be great if everyone could decide on something and get that to a good working state.

Given the constraints, I think that all three of them are doing a good job at trying to collaborate. However, they develop these projects in their leisure time; hence, many implementation details are not clear until they actually get to work on them. It's not like they can have a plan to work on several days consecutive.

@nfrancque
Copy link
Collaborator

Ideally, we would have a unified AST, and multiple bindings (Ada, Python, Rust). This reminds me of f4pga/ideas#19.

That would be nifty. As a side note - with Symbiflow I believe we've referenced every FPGA-related open source project that exists in this thread now :)

Thus, preserving comments and whitespaces (not touching the sources) has always been his primary focus. Conversely, GHDL documentation of the internals says explicitly that comments and such are ignored.

Glad I asked about the whitespace, so most likely the only option here would be PyVHDLParser? Unless Kraigher tells us differently.

Given the constraints, I think that all three of them are doing a good job at trying to collaborate.

Yes, I did not mean to understate the awesome work everyone has done! All of these projects are pretty incredible. The fact that Symbiflow actually works blows my mind.

@jeremiah-c-leary
Copy link
Owner Author

Hey @eine, @Paebbels, @nfrancque and @m-kru,

This is a great discussion and improved my understanding of the pros and cons of each option with respect to VSGs goals.

Given everything that was discussed I agree with @m-kru and @eine that pyVHDLParser is worth investigating before proceeding with our own tokenizer. I can see using the output of the first phase, but maybe not the second phase.

I would like to thank everyone for their input. It was very enlightening.

@kraigher
Copy link

kraigher commented Jan 9, 2020

I saw the Rust HDL was mentioned in this thread so let me share my goals of the project as well as its main advantages.

Rust HDL is an umbrella project with the goal of provide modular software components blocks for creating HDL tools.
The focus is on VHDL but I have not excluded also working on Verilog in the future if I have time thus no V before HDL in the name.

There are currently two sub-projects:

  1. vhdl_lang - VHDL language front-end that does parsing and analysis of VHDL code as well as allows queries into the structure of the code such as finding all references of a symbol
  2. vhdl_ls - VHDL language server implementation using vhdl_ls under the hood,

I have focused on the language server as the initial sub-project driving the development of vhdl_lang since it it puts the toughest requirements on the architecture. However the goal is for vhdl_lang to be the basis of other tools such as VHDL documentation generator, VHDL code formatter and VHDL lint, VHDL simulator and VHDL synthesis. Actually the language server protocol supports code formatting commands so code formatting is already a requirement for a full language server. Code formatting and document generation requires lossless AST which thus is a requirement of vhdl_lang. Thus my entire mindset when developing the code is for modularity and a broad set of use cases where we do not optimize for only one use case at the cost of all others.

Some examples of how lossless parsing has shaped vhdl_lang are:

  1. Comments are kept and attached to other tokens as either leading or trailing comment.
    Each comment can either be associated with the token before or after.
  2. Even though VHDL is case insensitive I keep track of the original case of every identifier instead of just normalizing the case of everything which would be simpler and have more performance.
  3. Every part of the AST has source location information attached.
  4. Parsing is a distinct step which produces a non-analyzed AST and can be run separate from analysis.
  5. Tokenization can be run as a separate step as well.

As I write in my project readme using Rust as the implementation language for these tools is the ideal choice because of:

  1. Best in class compiled performance comparable to C/C++
  2. Good support for bug free mulit-threading
  3. Strong static type system enables the code base to be large without losing control of complexity and is excellent for traversing an AST and matching different structures in it in way that is checked at compile time
  4. Good libraries & standardized tooling. Just install the rust tool-chain and run cargo build or cargo test to get started. It will automatically load and compile all the dependencies and build a statically linked binary that you can just ship. There are no tricky autotools-like mess with dependencies on system libraries and dynamic linking. Excellent unit test libraries are included by default.
  5. Strong momentum and uptake as a programming language with a bright future.

I do not consider C/C++, Python or ADA as living up to all of the above advantages as Rust does. Deciding to write it in Rust has been a very good choice that I am very satisfied with.

The state of vhdl_lang / vhdl_ls is progressing well where parsing is very complete and semantic analysis well under way. I expect it to do full semantic analysis of VHDL before the summer. It is already good enough to support goto-declaration and find-all-references for non-overloaded names. Thus you can find all references of design units, signals, types constants etc as well as goto their declaration.

It supports multi-threaded parsing and analysis and can parse and analyze a project with 444 files and 190000 lines of code in 212 ms on my quad core laptop. Since I support incremental analysis when editing code in the editor with the language server as a backend it will run in sub 10 ms for every keystroke even in huge projects giving instant feedback as you type.

The user does not need to specify any analysis order as it is determined automatically as well as detecting circular dependencies. This also a requirement for language server applications. It could emit a dependency tree for use by other tools today I a just added the printing of it.

It is very well tested at different levels which combined with the static typing makes the code base a lot easier to modify for other contributors and thus I have got several third party contributions even tough I have not documented the internals anything yet.

As I write in my project readme the goal is to support other tools built on top of vhdl_lang and if someone shows interest in doing that and invests their time I will try to support them. But of course time is limited and even though I recognize a certain goal as valid for the project does not mean I can assign highest priority to make it happen. My goal is to enable the project to have several regular contributors and not rely on a single contributor eventually.

@kraigher
Copy link

kraigher commented Jan 9, 2020

After looking more into what your project aims to achive it seems very aligned with rust hdl goals. You will need full parsing to not reach a dead end which vhdl_lang already does for all of vhdl 2008. Also Python is really not the best language to inspect and pattern match complex data structures as there is no static typing unless you use mypy and not even a case statement. There wil be a lot of bugs when you refactor without strict use of pylint mypy for code analysis applications where the data structure is nested and complex. Performance will also be a problem with Python where it will be very sluggish for larger code bases.

Most of what you have today could probably be written by traversing the ast that vhdl_lang creates and then you can just focus on adding value with the rules and not have to reinvent vhdl parsing which is complex. The goal of rust is to enable more tools by sharing the important components.

@m-kru
Copy link
Collaborator

m-kru commented Jan 9, 2020

I have read the discussion on gitter, all the comments here, some other issue and it is obvious that we all share the same goal to make VHDL development easier and cheaper. Despite the fact, that all of us would like to have single official unofficial VHDL parser, it looks like we will still work separately. For example take the objective to check formatting style of .vhd files. It is literally mentioned in each of the following projects: vhdl-style-guide, pyVHDLParser, rust_hdl, GHDL.
The same with VHDL language servers, do we need more than one? I do not think so.

I do not believe, that rewriting GHDL from language X to language Y can potentially attract more contributors. But imagine, that there is one unofficial organization developing FOSS VHDL tools. There is only one parser that all tools depend on and all applications use the same language Z. Sounds a bit like utopia, but I think it would be achievable. In my humble opinion, in the current situation it is more important to unify people already contributing to VHDL tools (so that they do not implement the same things in different ways) rather than attracting new contributors. When they see how clean and well-thought everything is, they will probably come despite the programming language being used.

Of course first, and probably the biggest, issue is the human factor. Some tools would need to be abandoned and some rewritten. People somehow find it hard to accept, that community chooses tool with functionality F implemented by someone else in a bit different way. So what they do? Continue working on their own tool. And here were are, back at the starting point ...

@eine
Copy link

eine commented Jan 9, 2020

Despite the fact, that all of us would like to have single official unofficial VHDL parser, it looks like we will still work separately. For example take the objective to check formatting style of .vhd files. It is literally mentioned in each of the following projects: vhdl-style-guide, pyVHDLParser, rust_hdl, GHDL.

While I understand your frustration, that's the beauty and the pain of non-profit, free and open source software... Although style/formatting is mentioned in all four projects, only one of them is focused on it. GHDL is a simulator (soon a synthesiser). pyVHDLParser is the foundation of a fancy documentation generator. rust_hdl is a toolbox. vhdl-style-guide is the only one which is primarily meant for it.

Should we be talking about commercial products, or projects backed by companies (such as VSCode), the request to support style/formatting would probably have been ignored, put in the Backlog or just considered of lowest priority in three of those projects. Instead, because those are open source and open projects, we get "this is technically possible, would love to have it some day". In the end of the day, time is money, and the time that a handful of people can donate is very limited.

The same with VHDL language servers, do we need more than one? I do not think so.

I believe that the language server, pretty-printing, etc. in GHDL are more of toy/concept projects than actual features, for now. This is true for any non-simulation non-synthesis feature, and even for code coverage. Actually, I am convinced that the reason for all those conceptual implementations to exist is that, for a decade, there was no VHDL parser that could be compared to GHDL. Hence, Tristan tried to provide hints about how it could be used for purposes complementary to simulation, which he received many questions about but which he could not implement due to time constraints.

A language server is not listed as a use case of pyVHDLParser. Although some features are similar, pyVHDLParser seems not to be focused on providing features such as snippets/templates, completion. Furthermore, because of performance reasons, this Python based implementation is better suited for one-time executions (say CI jobs), rather than tight loops. Precisely, this is the niche (easy installation, long pipelines) where pyVHDLParser fits (and it is not redundant), compared to other compiled tools.

Therefore, I'd say that there is a single stable and actively maintained language server ATM, Kraigher's. The only reason to use GHDL's LS now is that it's already there for all of us that use GHDL and VUnit (or any other python tool). Since libghdl is built by default, enabling the language server is a matter of installing/copying two tiny python packages.

Related to language servers, we must account for clients too. Currently, there is an VSCode extension for Kraigher's project and another one for GHDL. As you say, I thought it was nonsense to have two of them, so I commented it a couple of times. IIRC, @GlenNicholls tried to achieve it, and he spent several days analysing how to merge both extensions, so that a single one would work with any of the servers. In the end, he gave up.

I do not believe, that rewriting GHDL from language X to language Y can potentially attract more contributors.

When they see how clean and well-thought everything is, they will probably come despite the programming language being used.

I do think that it should be stated as "migrating from an old-fashioned environment to a moderm framework". A pain with GHDL is that there is no standard tool/procedure for documentation generation/extraction or code architecture analysis. The software architecture of any of the tools we are talking about is very complex. Not being able to grasp a rough idea of the big picture without diving into the sources is a serious stopper. Each time I try to implement something in GHDL, I feel blind, stupid and frustrated.

I think it is crazy to rewrite GHDL to Rust, and I was astonished when I saw Tristan comment it. However, should it be done, I do believe it would be positive and it would benefit contributions in the mid term. At least, it would be foundation for a merge with rust_hdl.

But imagine, that there is one unofficial organization developing FOSS VHDL tools.

Well, there is FOSSi already: https://fossi-foundation.org. However, if you felt that there was duplication, the list of projects can hurt: https://fossi-foundation.org/gsoc19-ideas

There is only one parser that all tools depend on and all applications use the same language Z.

As said, I think we should focus on the features/resources that are not implemented yet and which can be shared in the future; rather than wishing some utopia. One of such features is an AST format the parsers in different languages can share. This would allow parsing/tokenization to be decoupled from other passes.

There is also a lot of room for improvement in the level just above these tools: project/source management. To name a few tools: VUnit, rust_hdl, GHDL, ghdl-language-server, edalize, fusesoc, tsfpga, duh, hdl-component-manager, Precision, Vivado... All of those require users to provide a list of sources mapped to a set of libraries. Yet, there is no standard JSON/YAML/TOML format to feed all of them.

Precisely, I talked about hdl-prj.json with Tristan and he told me that he has not defined the format because he'd like it to be common to other tools and not exclusive for GHDL/ghdlsynth/ghdl-ls.

For me, as a user, it is positive that Kraigher, Tristan and Patrick develop three projects with a few overlapped features; as long as it is not painful to switch between them. I'd use GHDL for simulation, rust_hdl for LS (server and client) and pyVHDLParser for sphinx integration. The pain now is to learn/configure eveything three times.

@m-kru
Copy link
Collaborator

m-kru commented Jan 9, 2020

Each time I try to implement something in GHDL, I feel blind, stupid and frustrated.

Yeah, I once wanted to extend some PSL support. After 1 day of code analysis I gave up.

I think it is crazy to rewrite GHDL to Rust, and I was astonished when I saw Tristan comment it.

It sounds crazy, however the longer I think about it the more benefits I see. There is no doubt, that GHDL is one man show. If there were more people involved in the rewrite process than @kraigher and @tgingold the knowledge would get spread. Maybe it would be nice to give this idea at least a proof of concept. For example find 3-4 volunteers that would try to rewrite some part of GHDL to check how fast the work would go.

@jeremiah-c-leary
Copy link
Owner Author

so after a long time thinking about tokens and parsing VHDL, I think I have come up with a partial solution.

In VSG, the parsing and the rule analysis are tightly coupled. While this worked initially, I think I have come to the point where I need to abstract the parsing from the rules. With the goal being that I could replace the current parsing code with another parser and then use the abstraction layer to provide the information a rule needs to operate.

This will require defining an API to the parser. For example, for rules that deal with entities I could request all the entities in the file:

  entities = parser.get_entities()

Then if I wanted the entity identifier:

   entities = parser.get_entities()
   for entity in entities:
       identifiers = entity.get_identifier()

I think this will give me the most flexibility to change parsers without having to change the rules. Unfortunately, this will require me to re-write all my rules. I think long term it will be a better solution. I plan to eventually write new rules in this method and start to re-write old ones as time goes on.

@nfrancque
Copy link
Collaborator

That makes sense. Would you like to use this issue to discuss what that API would look like? Also unsure how this would work for new rules without breaking old ones.

@cuijialang
Copy link

Checklist_ruleset、DO-254_ruleset、RMM_rulesetSafety-Critical_ruleset、Verification_ruleset、Essentials_ruleset、RTLC_ruleset
RMM_ruleset.zip
this is the rulesets
ruleset.zip

@eine
Copy link

eine commented Aug 5, 2020

@cuijialang, you triggered three notifications for @jeremiah-c-leary in less than 10h, regarding the same issue. Please, be polite and patient. The feature you are requesting is unlikely to be feasible in the short term.

@jeremiah-c-leary
Copy link
Owner Author

Hey @cuijialang ,
It looks like most of the rules are out of scope for VSG. There are rules on undriven ports, incomplete sensitivity lists, etc... VSG is focused on fixing style issues with the code.

@jeremiah-c-leary
Copy link
Owner Author

Here is an update,

I evaluated pyVHDLParser, but found it did not provide what I really wanted from a parser. I could have taken the output of pyVHDLParser and turned it into what I wanted, but at that point it seemed a better alternative to just write my own. So after two false starts I finally have some semblance of a parser.

I used the 2008 LRM and based all my tokens on the production name. I am currently re-writing my 443 rules to use the new parser. I tried to keep both the old method and the new parser working at the same time, but at some point I broke the old one. Oh well, I was going to toss it anyway when I am done re-writing the rules.

I will probably close this issue in about a week.

Regards,

--Jeremy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants