Use file-level where possible for faster computation #2449

AshesITR · 2023-12-15T06:53:02Z

There are only two linters incompatible with file-level lints (as evidenced by the hacky PR failures here):

cyclocomp_linter
spaces_left_parentheses_linter

All other linters could compute on the single file-level source expression, for potentially huge gains by avoiding function calls, loops, appends, ...

What needs to be solved is how to cache and retrieve lints in this run mode.
Once that's done, we can add a new attribute (max_level?) to Linter() that signals lint() that the linter can handle parallel linting of all expression-level source expressions.

WDYT about the idea?
Do you have any ideas on the cache part?
I'm especially interested in the scenario where a cache entry is available for most, but not all individual expressions.

AshesITR · 2023-12-15T06:59:07Z

Some thoughts: get_lints() will have to be broken up into an execution planning phase

peek cache for all (linter, expr) combinations
schedule available entries for cache retrieval
schedule cache misses for (parallel) linting

and a run phase

retrieve everything cached from the cache
run parallel linting for linters that support it and cache their results (expr-level)
run sequential linting for all other linters and cache their results
combine lints from steps 1-3

AshesITR · 2023-12-15T23:40:12Z

Not bad so far, but probably still some room for improvement:

devtools::load_all()
system.time( lint_package())
# here:
#   user  system elapsed 
# 89.799   0.460  90.362 
# main:
#    user  system elapsed 
# 104.109   0.473 104.654

# Conflicts: # R/any_is_na_linter.R # R/vector_logic_linter.R

…tching

codecov-commenter · 2023-12-15T23:56:59Z

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (4b59aac) 98.53% compared to head (9c7db92) 98.13%.

❗ Current head 9c7db92 differs from pull request most recent head f5e633d. Consider uploading reports for the commit f5e633d to get more accurate results

Files	Patch %	Lines
R/lint.R	83.21%	23 Missing ⚠️
R/source_utils.R	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2449      +/-   ##
==========================================
- Coverage   98.53%   98.13%   -0.41%     
==========================================
  Files         126      126              
  Lines        5676     5799     +123     
==========================================
+ Hits         5593     5691      +98     
- Misses         83      108      +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

AshesITR · 2023-12-16T00:22:27Z

   user  system elapsed 
 53.661   0.446  54.201

Now just have to fix all the unexpected problems and add some tests for the newly introduced branches :)

AshesITR · 2023-12-16T01:25:26Z

Caching seems broken, but it's hard to reproduce locally for some reason.
Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

MichaelChirico · 2023-12-16T01:36:59Z

Caching seems broken, but it's hard to reproduce locally for some reason. Trying to figure it out.

Seeing the time for linting roughly halved without caching at least shows there is merit in this approach 🥲

indeed looks amazing! I also wonder if we should add an option to get_source_expressions() to skip building the expr-level objects. this after Rdatatable/data.table#5830 (comment) where a massive file spends the vast majority of compute time on this step.

anyway, I'm thinking it's prudent to save this PR for after release -- messing around with the caching seems like a minefield for hard-to-catch bugs. would be good to let this hang around in dev for longer to see what bubbles up.

AshesITR · 2023-12-16T17:03:54Z

I had thought about lazily building them, but it's not useful as long as not all linters support "batches", because they are needed at least once anyway.

Maybe there is a more performant way to create the objects under the assumption that the tree is read-only.

[hack]: Force file-level for all linters

ef0b7fb

AshesITR requested review from MichaelChirico and IndrajeetPatil December 15, 2023 07:04

AshesITR added 2 commits December 15, 2023 23:25

add skeleton implementation

b1258fb

enable batched linting for all expression-level linters

6bf2d5a

AshesITR added 2 commits December 16, 2023 00:41

Merge branch 'main' into feature/use-file-level

e6c018d

# Conflicts: # R/any_is_na_linter.R # R/vector_logic_linter.R

fix missing return from handle_expr_level_lints(), actually enable ba…

d92d504

…tching

AshesITR added 2 commits December 16, 2023 01:02

speed up collapse_exprs()

43387bb

delint

72142bd

AshesITR added 5 commits December 16, 2023 01:23

optimize collapse_exprs()

1146fe7

fix tests

e1fac0f

delint

03e4cdd

add NEWS.md, try disabling batch-cache to check tests

f5e633d

move supports_exprlist

41a2163

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use file-level where possible for faster computation #2449

Use file-level where possible for faster computation #2449

AshesITR commented Dec 15, 2023

AshesITR commented Dec 15, 2023

AshesITR commented Dec 15, 2023

codecov-commenter commented Dec 15, 2023 •

edited

Loading

AshesITR commented Dec 16, 2023

AshesITR commented Dec 16, 2023

MichaelChirico commented Dec 16, 2023

AshesITR commented Dec 16, 2023 •

edited

Loading

Use file-level where possible for faster computation #2449

Are you sure you want to change the base?

Use file-level where possible for faster computation #2449

Conversation

AshesITR commented Dec 15, 2023

AshesITR commented Dec 15, 2023

AshesITR commented Dec 15, 2023

codecov-commenter commented Dec 15, 2023 • edited Loading

Codecov Report

AshesITR commented Dec 16, 2023

AshesITR commented Dec 16, 2023

MichaelChirico commented Dec 16, 2023

AshesITR commented Dec 16, 2023 • edited Loading

codecov-commenter commented Dec 15, 2023 •

edited

Loading

AshesITR commented Dec 16, 2023 •

edited

Loading