-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add code for analyzing dependencies between syntaxes #1762
Conversation
Some syntaxes (I'm specifically thinking of Git Common) may be marked as hidden and thus perhaps shouldn't need an independent syntax set. I noticed that some syntaxes also list themselves as a dependency. Also I'm not entirely sure I understand what the end result will be - literally a syntax set for every "root" syntax as shown in the list? In some cases it may make sense to group them, especially if they only have one dependency difference etc. - again I'm thinking of all the Git related syntaxes here (and Markdown + MultiMarkdown) - they could be bundled into one syntax set, unless its not worth it and the time saved loading one or two syntaxes will outweigh the deduplication? I'm not seeing any obvious dependencies missing, so that"s good 👍 |
9d4e6a9
to
a8f1409
Compare
Thanks a lot for taking a look. That was very valuable input, especially considering it comes from such a prominent syntax expert as yourself :) I have now updated the commit with the following changes, based on your input:
Yes. But in the latest code I skip the hidden syntaxes. It is a good point that we might not need both the But, as outlined in my plan, I am aiming for an MVP where we only improve startup time when highlighting source files without dependencies. And there are quite many (expand list below to see). So it is fine to merge the code as is in that regard. Then we can investigate and experiment with the right way forward for syntaxes with dependencies at a later point. Anyway, I consider this code to be ready for a "real" code review. We are still blocked on a new release of Here is what the new full list of independent syntax sets looks like: Independent syntaxes
|
src/assets.rs
Outdated
if false { | ||
// To trigger this code, run: | ||
// cargo run -- cache --build --source assets --blank --target /tmp | ||
crate::syntax_analysis::print_syntax_dependencies(&syntax_set_builder); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to merge this as-is? (with the if false
?)
Other ideas:
- hide behind a cargo-feature (so we can trigger it with something like
cargo run --features print_syntax_dependencies -- cache --build …
) - enable it with a environment variable
if std::env::var("BAT_PRINT_SYNTAX_DEPENDENCIES").is_ok() { …
. Contrary to the other two options, this would always include the necessary code in the binary. The env-var check wouldn't hurt at this place though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed code review!
I considered adding a cargo-feature for it, but since those are practically part of the formal public API, and since the if false
is a temporary thing, I opted out of it.
I do however really like the idea of triggering the code with an env var. Will fix!
src/lib.rs
Outdated
@@ -40,6 +40,7 @@ mod preprocessor; | |||
mod pretty_printer; | |||
pub(crate) mod printer; | |||
pub mod style; | |||
mod syntax_analysis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding, but maybe use another name here? It sounds a bit like we are actually performing syntax analysis (like on an AST after parsing) here. Maybe syntax_set_analysis
? Or syntax_dependencies
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with syntax_dependencies
👍
src/syntax_analysis.rs
Outdated
/// Used to look up (by name) what dependencies a given [SyntaxDefinition] has | ||
type SyntaxToDependencies = HashMap<String, Vec<Dependency>>; | ||
|
||
/// Used to look up what [SyntaxDefinition] that corresponds to a given [Dependency] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Used to look up what [SyntaxDefinition] that corresponds to a given [Dependency] | |
/// Used to look up which [SyntaxDefinition] corresponds to a given [Dependency] |
(if I'm interpreting the sentence correctly)
src/syntax_analysis.rs
Outdated
/// Used to look up (by name) what dependencies a given [SyntaxDefinition] has | ||
type SyntaxToDependencies = HashMap<String, Vec<Dependency>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with the doc comment. An alternative route that I also like, is to use typedefs to express the intent:
type SyntaxName = String;
type SyntaxToDependencies = HashMap<SyntaxName, Vec<Dependency>>;
which might not need a doc comment at all.
src/syntax_analysis.rs
Outdated
/// Change to true to make syntax dependency analysis print more details of what it is seeing | ||
const VERBOSE: bool = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer a runtime switch here (a verbose: bool
member in SyntaxSetDependencyBuilder
).
Or to get rid of it completely by either removing the verbose=true path or by always being verbose. But only if the verbose-mode output is still readable (haven't tested it yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the "unlinked context" code from Keith prints mostly the same thing as my verbose outputt. But in a nicer format, because my output is repeated a lot. So I'll just remove it from my code. And from what I understand, when we differ, it is because it is a non-fatal dependency that is missing.
For the record, my code warned about these extra missing dependencies that Keith code did not:
WARNING: No syntax found ByName("OpenMP")
WARNING: No syntax found ByScope(<source.js.css>)
WARNING: No syntax found ByScope(<source.lean.markdown>)
WARNING: No syntax found ByScope(<text.dart-doccomments>)
WARNING: No syntax found ByScope(<text.html.php>)
I'll unconditionally keep
eprintln!("WARNING: Unknown dependencies for {}", name);
though, because that shall never happen. I'll change it into an ERROR: ...
.
src/syntax_analysis.rs
Outdated
.syntaxes() | ||
.iter() | ||
.map(|syntax| &syntax.name) | ||
.collect::<Vec<&String>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For .collect
, it's often enough to specify the container and let the inner type be inferred:
.collect::<Vec<_>>();
src/syntax_analysis.rs
Outdated
.map(|context| &context.patterns) | ||
.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use flat_map
here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That's a very nice simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a pleasure to review this, thank you very much! A module with non-trivial functionality structured in a way that makes it easy to understand and read 👍
082408c
to
a3b178c
Compare
Again, big thanks for the detailed code review 🙏 All comments should be addressed now, so feel free to take a second look when you get time. |
a3b178c
to
50f65b9
Compare
(New commit(s) is/was just a rebase on top of origin/master, mainly for the |
And also to generate independent SyntaxSets. This will later be used to improve bat startup time.
50f65b9
to
e264ff4
Compare
This will eventually allow us to improve the startup speed of bat. See #951.
It adds code to analyze dependencies between
SyntaxDefinition
s. It currently comes up with the following list of independent syntaxes.One thing I'm curious about is if you think this list, and the code I use to produce it, looks reasonable? Or can you spot any obvious dependencies that it fails to find?
I'm sure we're going to have to tweak the code that does the analysis, but it would be nice to get the basics of it in place so we can iterate on it.
List of independent syntaxes