Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use file-level where possible for faster computation #2449

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

AshesITR
Copy link
Collaborator

There are only two linters incompatible with file-level lints (as evidenced by the hacky PR failures here):

  • cyclocomp_linter
  • spaces_left_parentheses_linter

All other linters could compute on the single file-level source expression, for potentially huge gains by avoiding function calls, loops, appends, ...

What needs to be solved is how to cache and retrieve lints in this run mode.
Once that's done, we can add a new attribute (max_level?) to Linter() that signals lint() that the linter can handle parallel linting of all expression-level source expressions.

WDYT about the idea?
Do you have any ideas on the cache part?
I'm especially interested in the scenario where a cache entry is available for most, but not all individual expressions.

@AshesITR
Copy link
Collaborator Author

Some thoughts: get_lints() will have to be broken up into an execution planning phase

  1. peek cache for all (linter, expr) combinations
  2. schedule available entries for cache retrieval
  3. schedule cache misses for (parallel) linting

and a run phase

  1. retrieve everything cached from the cache
  2. run parallel linting for linters that support it and cache their results (expr-level)
  3. run sequential linting for all other linters and cache their results
  4. combine lints from steps 1-3

@AshesITR
Copy link
Collaborator Author

Not bad so far, but probably still some room for improvement:

devtools::load_all()
system.time( lint_package())
# here:
#   user  system elapsed 
# 89.799   0.460  90.362 
# main:
#    user  system elapsed 
# 104.109   0.473 104.654 

@codecov-commenter
Copy link

codecov-commenter commented Dec 15, 2023

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (4b59aac) 98.53% compared to head (9c7db92) 98.13%.

❗ Current head 9c7db92 differs from pull request most recent head f5e633d. Consider uploading reports for the commit f5e633d to get more accurate results

Files Patch % Lines
R/lint.R 83.21% 23 Missing ⚠️
R/source_utils.R 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2449      +/-   ##
==========================================
- Coverage   98.53%   98.13%   -0.41%     
==========================================
  Files         126      126              
  Lines        5676     5799     +123     
==========================================
+ Hits         5593     5691      +98     
- Misses         83      108      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AshesITR
Copy link
Collaborator Author

   user  system elapsed 
 53.661   0.446  54.201 

Now just have to fix all the unexpected problems and add some tests for the newly introduced branches :)

@AshesITR
Copy link
Collaborator Author

Caching seems broken, but it's hard to reproduce locally for some reason.
Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

@MichaelChirico
Copy link
Collaborator

Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

indeed looks amazing! I also wonder if we should add an option to get_source_expressions() to skip building the expr-level objects. this after Rdatatable/data.table#5830 (comment) where a massive file spends the vast majority of compute time on this step.

anyway, I'm thinking it's prudent to save this PR for after release -- messing around with the caching seems like a minefield for hard-to-catch bugs. would be good to let this hang around in dev for longer to see what bubbles up.

@AshesITR
Copy link
Collaborator Author

AshesITR commented Dec 16, 2023

I had thought about lazily building them, but it's not useful as long as not all linters support "batches", because they are needed at least once anyway.

Maybe there is a more performant way to create the objects under the assumption that the tree is read-only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants