Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to output the ast even before import resolution #8739

Closed
chriseth opened this issue Apr 22, 2020 · 23 comments · Fixed by #9364
Closed

Option to output the ast even before import resolution #8739

chriseth opened this issue Apr 22, 2020 · 23 comments · Fixed by #9364

Comments

@chriseth
Copy link
Contributor

chriseth commented Apr 22, 2020

We should provide the options to abort after certain steps:

  • parsing
  • import resolution
  • symbol resolution
  • type checking

etc.

-> only implement a switch a la stop-after: parsing for now, and only allow parsing for now.

@axic
Copy link
Member

axic commented May 6, 2020

Could add a stage field to the standard json which signals up to what point to run the CompilerStack: parse, importResolution, analysis, compilation.

@axic axic added the feature label May 6, 2020
@ekpyron
Copy link
Member

ekpyron commented May 6, 2020

Alternative: more fine-grained output selection, i.e. being able to select "preImportResolutionAST" and nothing else would stop before import resolution.
Advantage: we would have to think about and specify what information actually is present for those output selections.

@alcuadrado
Copy link
Member

alcuadrado commented May 9, 2020

The main reason we are interested in this feature is to be able to extract imports reliably. Now, we are using the ANTLR grammar, which doesn't represent the exact same language, especially when a file fails/should fail to parse.

As I mentioned Chris in a call, we are ok with not getting an AST if the file fails to parse, but we'd love to get the same errors that solc would return.

I think anyone working on tools that process Solidity sources would benefit from this, so I'll tag a few of them: @iamdefinitelyahuman @cgewecke @fvictorio @frangio @jcaracciolo

@jcaracciolo
Copy link
Contributor

jcaracciolo commented May 11, 2020

I have played around with this idea, and it seems is not that hard to accomplish. Here is a fork where i manage to get some functionality to work for it with some considerations (jcaracciolo@bb4a0aa). I know it may probably break in most edge cases, but i try it out with a couple of contracts and worked out great.

There is an issue with the requirements to "analyze" the file when the AST JSON is generated. This is because the AST JSON contains much more data than just parsed information (such as scope, complex types and others) that require for more processing to be done. This will be trickier without the imports since some of the information required to "analyze" the file is present in other files, and the compiler will fail to execute if it cant find them.

There has to be some thought if all this extra information gathered after the parsing is required in the AST JSON when parsing only one file. If is not, then is trivial to do the change, just skip the this loop and double check the AST JSON generation wont throw a null pointer on missing data (which is basically the commit in my fork).

If in fact it is, then there are a lot more edge cases to take into account when trying to analyze a file without its imports.

@frangio
Copy link
Contributor

frangio commented May 12, 2020

The main reason we are interested in this feature is to be able to extract imports reliably.

I agree that this is much needed. We should consider changing the focus of this issue to import extraction exclusively. It might be interesting to get an AST with unresolved imports and symbols, but it might be best to discuss it as a separate feature.

Something interesting about import extraction is that it's recursive, so the user would need to request import extraction repeatedly. For this use case, requesting an entire AST is overkill and may have a performance impact.

@chriseth
Copy link
Contributor Author

By the way, when using javascript, this can be done "manually" using the "import file read callback" - and that is actually what it is meant to be used for.

@alcuadrado
Copy link
Member

I think the import file callback is somewhat limited, as it only works in js, and doesn't work when using a native/docker version of solc. As we discussed in #3168 and in private, we think using a native solc would improve the development speed in most cases.

The benefits of exporting the parser aren't just extracting imports. That's why I tagged other people that deal with solc's last or build/use their own parsers. Can you please comment here if this would help you?

@iamdefinitelyahuman
Copy link
Contributor

I have two main uses for accessing an AST prior to compilation:

  • I can more effectively determine if a contract has changed, and from that infer which source files to recompile. It's unfortunate to have to run the compiler on an entire project because the user added an extra line in their SafeMath library.
  • I can better determine dependencies and associations. Knowing imports is useful, but a single source file can contain multiple contracts and a user might forget to remove an import statement for a contract they no longer require. The AST lets me build an effective dependency chain so I know exactly which contracts to recompile when I detect a change.

I'm also in agreement with @alcuadrado that this is useful in the native solc.. I'm working in Python so solcjs isn't an option.

@cgewecke
Copy link

To concur with @iamdefinitelyahuman and @alcuadrado, an exported parser would be very helpful. There's always a gap between the JS parser's accuracy and solc as new syntax is introduced and it's the source of a large number of issues in both solidity-coverage and eth-gas-reporter.

These tools rely on fast parsing either to avoid double-compilation (because they modify contracts) or strong coupling with the user's compilation strategy (e.g requiring they use the correct solc output selections and knowing how they store their comp artifacts locally)

@jcaracciolo
Copy link
Contributor

Is there any update on where this issue is heading? I do agree there are much more cases in which an exported parser could be used for, other than extracting imports.

@chriseth
Copy link
Contributor Author

I still like @axic's suggestion most. Would we add it to settings or to the top level?

@franzihei
Copy link
Member

franzihei commented Jun 17, 2020

Hey everyone! We, the Solidity team, would like to discuss this topic in a little language design discussion call with you next week to come to a conclusion about the implementation.

When?

Thursday, 25th of June, 5-6PM CEST (see time in your timezone here)

What?

Discuss issue #8739 - "output the AST even before import resolution"

How?

I will share a link to join the meeting in this issue 10mins before the start. You can also get a calendar invite with all information, if you send me your e-mail address as a DM on Gitter (@franzihei).

Please let me know if you're interested in participating. :)

@chriseth
Copy link
Contributor Author

chriseth commented Jun 24, 2020

During the call we would like to know:

  • what the participants would use the feature for
  • what exactly is needed as output
  • if other stages to stop compilation would be useful

@franzihei
Copy link
Member

Join the call in 15 mins here --> https://meet.google.com/mga-marp-oid.

@franzihei
Copy link
Member

Thanks everybody for joining the call!

Really rough notes (please feel free to correct or add to it):

Attendees: Alex B, Aniket, Ben H, Chris R, Francisco G, Franco V, Franco Z, Franziska H, Nick D, Patricio P, Yann L.

Questions for tool builders:

  • what the participants would use the feature for
  • what exactly is needed as output
  • if other stages to stop compilation would be useful

Buidler: need relative path as it is written in Solidity source code

Truffle: we look to see which Sol files have been changed; min set of files that need to be compiled at a certain time; we don’t need absolute paths either, we just need relative paths upfront, list of sources is only used for the “profiler”

Solhint: needs to parse files to do static analysis, using JS parser, for prettier I just need the syntax, no need for types, for solhint, only syntax - works on file level (no cross file analysis), if we had a type AST that could probably be useful too (but not sure how yet)

Chris: let’s introduce a generic flag to stop after point x and at a later point in time we could still introduce other stages to stop?

Remix: will use features to have fast feedback and display more details quickly

Brownie: would it be possible to have an AST fully linked without static analysis?

Chris: yes, before type checker

Brownie: could I build a dependency tree with that? having an AST for just one file is useful; if I can build all the associations; build dependent chain without static analysis would allow me to fine tune what contracts I am compiling

Chris: I think the main problem is that the compiler is f***ing slow, we should also focus on that to increase compilation time

Truffle: but exposing a parser would be easier than making the compiler faster and it’s useful.

[...]

We also discussed other tooling issues in this call, which are not relevant for this issue (hence I'm not putting the other notes here).

@chriseth
Copy link
Contributor Author

summary: Implement a switch e.g. stop-after: parsing that only handles a single file and does not even call the import callback. I think the absolute path can still be put in the annotations. For now, we only have 'parsing' - but we could extend it at some point in the future by defining other "checkpoints". The main problem is to check that such an incomplete AST can be exported without crashing.

@fvictorio
Copy link
Contributor

I created protofire/solhint#232 to keep track of this in solhint. I'm referencing it here because I assume other people building on top of this future solc parser might have similar issues.

@chriseth chriseth mentioned this issue Jul 8, 2020
2 tasks
@alcuadrado
Copy link
Member

Is it safe to assume these things?

  1. A parser for v0.4.26 can parse any v0.4.x file?
  2. A parser for v0.5.17 can parse any v0.5.x file?

If so, can we consider backporting this change to each of those majors? This way we'd be able to parse any file with solc.

@chriseth
Copy link
Contributor Author

We usually only backport critical bugfixes, but we can see how much work the task is overall.

@fzeoli
Copy link

fzeoli commented Oct 28, 2020

Backporting this to 0.4.26 and 0.5.17 would be a radical step in making this hugely valuable. The main objective behind this was to for the first time have a proper parser that works reliably, covers the entirety of the language and is always up to date, but if it's only available for the newer versions then parsing code for the older versions that still are relevant today remains a problem. This means Solidity parsing as a whole will remain an unsolved problem, making it hard to step up the tooling ecosystem.

This is super important for developer experience.

@fzeoli
Copy link

fzeoli commented Feb 24, 2021

Any comments on this? Hardhat isn't using this yet because without backporting it'd introduce additional complexity without really solving reliable parsing.

@chriseth
Copy link
Contributor Author

Sorry, for not answering earlier: This was a really big chunk of work, and I would prefer working on moving the language forward, so that people switch over to new versions faster instead of maintaining old versions.

@fzeoli
Copy link

fzeoli commented Feb 25, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.