Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster vars used tracking in simplify let visitor #8205

Merged
merged 18 commits into from
Apr 28, 2024

Commits on Apr 17, 2024

  1. Rewrite IREquality to use a more compact stack instead of deep recursion

    Deletes a bunch of code and speeds up lowering time of local laplacian
    with 20 pyramid levels by ~2.5%
    abadams committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    538577a View commit details
    Browse the repository at this point in the history
  2. clang-tidy

    abadams committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    7a60519 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    150f5e9 View commit details
    Browse the repository at this point in the history
  4. Fix computational complexity of substitute_facts

    It was O(n) for n facts. This makes it O(log(n))
    
    This was particularly bad for pipelines with lots of inputs or outputs,
    because those pipelines have lots of asserts, which make for lots of
    facts to substitute in.
    
    Speeds up lowering of local laplacian with 20 pyramid levels (which has
    only one input and one output) by 1.09x
    
    Speeds up lowering of the adams 2019 cost model training pipeline (lots
    of weight inputs and lots outputs due to derivatives) by 1.5x
    
    Speeds up resnet50 (tons of weight inputs) lowering by 7.3x!
    abadams committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    00b8126 View commit details
    Browse the repository at this point in the history
  5. Add missing switch breaks

    abadams committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    d3efa14 View commit details
    Browse the repository at this point in the history
  6. Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…

    …o abadams/faster_substitute_facts
    abadams committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    4dfbd72 View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2024

  1. Add missing comments

    abadams committed Apr 18, 2024
    3 Configuration menu
    Copy the full SHA
    22a04bd View commit details
    Browse the repository at this point in the history
  2. Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…

    …o abadams/faster_substitute_facts
    abadams committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    26b9cc2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ef4b2de View commit details
    Browse the repository at this point in the history
  4. Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…

    …o abadams/faster_substitute_facts
    abadams committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    6aebeb3 View commit details
    Browse the repository at this point in the history
  5. Make is_single_point compare min and max by deep equality

    Interval::is_single_point() used to only compare expressions by shallow
    equality to see if they are the same Expr object.
    
    However, bounds_of_expr_in_scope is really improved if it uses deep
    equality instead, so it has a prepass that goes over the provided scope,
    calls equal(min, max) on everything, and fixes up anything where deep
    equality is true but shallow equality.
    
    This prepass costs O(n) for n things in scope, regardless of how complex
    the expression being analyzed is. So if you ask for the bounds of '4'
    say in a context where there are lots of things in the scope, it's
    absurdly slow. We were doing this! BoxTouched calls
    bounds_of_expr_in_scope lots of times on small index Exprs within the
    same very large scope.
    
    It's better to just make Interval::is_single_point() check deep
    equality. This speeds up local laplacian lowering by 1.1x, and resnet50
    lowering by 1.5x.
    
    There were also places where intervals that were a single point were
    diverging due to carelessly written code. E.g. the interval [40*8,
    40*8], where both of those 40*8s are the same Mul node, was being
    simplified like this:
    
    interval.min = simplify(interval.min);
    interval.max = simplify(interval.max);
    
    Not only does this do double the simplification work it should, but it
    also caused something that was a single point to diverge into not being
    a single point, because the repeated constant-folding creates a new
    Expr. With the new is_single_point this matters a lot less, but even so,
    I centralized simplification of intervals into a single helper that
    doesn't do the pointless double-simplification for single points.
    
    Some of these shallowly-unequal but deeply-equal Intervals were being
    created in bounds inference itself after the prepass, which may have
    been generating suboptimal bounds. This change should fix that in
    addition to the compile-time benefits.
    
    Also added a simplify call in SkipStages because I noticed when it
    processed specializations it was creating things like (condition) ||
    (!condition).
    abadams committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    802ca67 View commit details
    Browse the repository at this point in the history
  6. clang-tidy

    abadams committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    b15a648 View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2024

  1. Configuration menu
    Copy the full SHA
    d00c397 View commit details
    Browse the repository at this point in the history
  2. Speed up the vars_used visitor in the simplifier let visitor

    This visitor shows up as the main cost of lowering in very large
    pipelines.
    
    This visitor is for tracking which lets are actually used for real
    inside the body of a let block (as opposed to the tracking we do when
    mutating, which is approximate, because we could construct and Expr that
    uses a Var and then discard it in a later mutation).
    
    The old implementation made a map of all variables referenced, and then
    checked each let name against that map one by one. If there are a small
    number of lets outside a huge Stmt, this is bad, because the data
    structure has to hold a number of names proportional to the stmt size
    instead of proportional to the number of lets.
    
    This new implementation instead makes a hash set of the let names, and
    than traverses the Stmt, removing names from the set as they are
    encountered. This is a big speed-up.
    
    We then make the speed-up larger by about the same factor again doing
    the following:
    
    1) Only add names to the map that might be used based on the recursive
    mutate call. These are very very likely to be used, because we saw them
    at least once, and mutations that remove *all* uses of a Var are rare.
    
    2) The visitor should early out when the map becomes empty. The let
    variables are often all used immediately, so this is frequent.
    
    Speeds up lowering of local laplacian by 1.44x, 2.6x, and 4.8x
    respectively for 20, 50, and 100 pyramid levels.
    
    Speeds up lowering of resnet50 by 1.04x. Speeds up lowering of lens blur
    by 1.06x
    abadams committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    4619886 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. Configuration menu
    Copy the full SHA
    a4cf0d0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    404622e View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2024

  1. Configuration menu
    Copy the full SHA
    b5db219 View commit details
    Browse the repository at this point in the history
  2. Reduce hash map size

    abadams committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    5130c4c View commit details
    Browse the repository at this point in the history