-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster vars used tracking in simplify let visitor #8205
Commits on Apr 17, 2024
-
Rewrite IREquality to use a more compact stack instead of deep recursion
Deletes a bunch of code and speeds up lowering time of local laplacian with 20 pyramid levels by ~2.5%
Configuration menu - View commit details
-
Copy full SHA for 538577a - Browse repository at this point
Copy the full SHA 538577aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7a60519 - Browse repository at this point
Copy the full SHA 7a60519View commit details -
Configuration menu - View commit details
-
Copy full SHA for 150f5e9 - Browse repository at this point
Copy the full SHA 150f5e9View commit details -
Fix computational complexity of substitute_facts
It was O(n) for n facts. This makes it O(log(n)) This was particularly bad for pipelines with lots of inputs or outputs, because those pipelines have lots of asserts, which make for lots of facts to substitute in. Speeds up lowering of local laplacian with 20 pyramid levels (which has only one input and one output) by 1.09x Speeds up lowering of the adams 2019 cost model training pipeline (lots of weight inputs and lots outputs due to derivatives) by 1.5x Speeds up resnet50 (tons of weight inputs) lowering by 7.3x!
Configuration menu - View commit details
-
Copy full SHA for 00b8126 - Browse repository at this point
Copy the full SHA 00b8126View commit details -
Configuration menu - View commit details
-
Copy full SHA for d3efa14 - Browse repository at this point
Copy the full SHA d3efa14View commit details -
Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…
…o abadams/faster_substitute_facts
Configuration menu - View commit details
-
Copy full SHA for 4dfbd72 - Browse repository at this point
Copy the full SHA 4dfbd72View commit details
Commits on Apr 18, 2024
-
3
Configuration menu - View commit details
-
Copy full SHA for 22a04bd - Browse repository at this point
Copy the full SHA 22a04bdView commit details -
Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…
…o abadams/faster_substitute_facts
Configuration menu - View commit details
-
Copy full SHA for 26b9cc2 - Browse repository at this point
Copy the full SHA 26b9cc2View commit details -
Configuration menu - View commit details
-
Copy full SHA for ef4b2de - Browse repository at this point
Copy the full SHA ef4b2deView commit details -
Merge remote-tracking branch 'origin/abadams/rewrite_ir_equality' int…
…o abadams/faster_substitute_facts
Configuration menu - View commit details
-
Copy full SHA for 6aebeb3 - Browse repository at this point
Copy the full SHA 6aebeb3View commit details -
Make is_single_point compare min and max by deep equality
Interval::is_single_point() used to only compare expressions by shallow equality to see if they are the same Expr object. However, bounds_of_expr_in_scope is really improved if it uses deep equality instead, so it has a prepass that goes over the provided scope, calls equal(min, max) on everything, and fixes up anything where deep equality is true but shallow equality. This prepass costs O(n) for n things in scope, regardless of how complex the expression being analyzed is. So if you ask for the bounds of '4' say in a context where there are lots of things in the scope, it's absurdly slow. We were doing this! BoxTouched calls bounds_of_expr_in_scope lots of times on small index Exprs within the same very large scope. It's better to just make Interval::is_single_point() check deep equality. This speeds up local laplacian lowering by 1.1x, and resnet50 lowering by 1.5x. There were also places where intervals that were a single point were diverging due to carelessly written code. E.g. the interval [40*8, 40*8], where both of those 40*8s are the same Mul node, was being simplified like this: interval.min = simplify(interval.min); interval.max = simplify(interval.max); Not only does this do double the simplification work it should, but it also caused something that was a single point to diverge into not being a single point, because the repeated constant-folding creates a new Expr. With the new is_single_point this matters a lot less, but even so, I centralized simplification of intervals into a single helper that doesn't do the pointless double-simplification for single points. Some of these shallowly-unequal but deeply-equal Intervals were being created in bounds inference itself after the prepass, which may have been generating suboptimal bounds. This change should fix that in addition to the compile-time benefits. Also added a simplify call in SkipStages because I noticed when it processed specializations it was creating things like (condition) || (!condition).
Configuration menu - View commit details
-
Copy full SHA for 802ca67 - Browse repository at this point
Copy the full SHA 802ca67View commit details -
Configuration menu - View commit details
-
Copy full SHA for b15a648 - Browse repository at this point
Copy the full SHA b15a648View commit details
Commits on Apr 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d00c397 - Browse repository at this point
Copy the full SHA d00c397View commit details -
Speed up the vars_used visitor in the simplifier let visitor
This visitor shows up as the main cost of lowering in very large pipelines. This visitor is for tracking which lets are actually used for real inside the body of a let block (as opposed to the tracking we do when mutating, which is approximate, because we could construct and Expr that uses a Var and then discard it in a later mutation). The old implementation made a map of all variables referenced, and then checked each let name against that map one by one. If there are a small number of lets outside a huge Stmt, this is bad, because the data structure has to hold a number of names proportional to the stmt size instead of proportional to the number of lets. This new implementation instead makes a hash set of the let names, and than traverses the Stmt, removing names from the set as they are encountered. This is a big speed-up. We then make the speed-up larger by about the same factor again doing the following: 1) Only add names to the map that might be used based on the recursive mutate call. These are very very likely to be used, because we saw them at least once, and mutations that remove *all* uses of a Var are rare. 2) The visitor should early out when the map becomes empty. The let variables are often all used immediately, so this is frequent. Speeds up lowering of local laplacian by 1.44x, 2.6x, and 4.8x respectively for 20, 50, and 100 pyramid levels. Speeds up lowering of resnet50 by 1.04x. Speeds up lowering of lens blur by 1.06x
Configuration menu - View commit details
-
Copy full SHA for 4619886 - Browse repository at this point
Copy the full SHA 4619886View commit details
Commits on Apr 23, 2024
-
Merge remote-tracking branch 'origin/main' into abadams/faster_vars_u…
…sed_in_simplify_let
Configuration menu - View commit details
-
Copy full SHA for a4cf0d0 - Browse repository at this point
Copy the full SHA a4cf0d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 404622e - Browse repository at this point
Copy the full SHA 404622eView commit details
Commits on Apr 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b5db219 - Browse repository at this point
Copy the full SHA b5db219View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5130c4c - Browse repository at this point
Copy the full SHA 5130c4cView commit details