-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up (~1.5x) and fix planner, including judder due to a planner-stepper race #27035
Conversation
1.5x - impressive. |
As a general comment, having three PRs operating on the same code at the same time is a bit scary. And there is a lot of change to the complex trapezoidal calculation code in this PR which is also a bit scary. What would help me in a lot in reviewing this is... I think I'd like to see the I'm quite keen to see this happen because from what I can see so far the idea of massively reducing the number of sqrts is excellent. I wish I'd spotted it myself! |
I agree it's getting complicated. That said, this change ensures that If you want to see this PR broken down into smaller commits.. it kinda is already. There are 3 commits:
I appreciate it. |
Another quickie optimization… For values that don't exceed +- 2^30 (i.e., long time periods? counting clock cycles?) it's safe to square before coercing to |
For I'd suggest making a check against overflow before integer squaring but that would probably defeat the optimization for some architectures since multiplication is quite fast on most microcontrollers. Regardless, seems like this square optimization is best left for a separate PR. |
Its actually not that uncommon to see rates well over that nowadays. Im more than happy to test changes against a 1.2m machine with nema23's and 5160s capable of moving over 1m/s. Gives us room to actually see it hit speed even if we widen up the accel curve. Machine is a little off base with an LPC4078 MCU and compiled with O3 instead of default Os typical... @p3p helped alot getting it up and running, but will eventually get to where that platform gets merged up. |
I agree. With and AVR Marlin simply cannot do more than about 8k steps/s but many 32 bit mainboards could easily do over 50k steps/s, at which point the square will overflow an int32.
Also agree. This PR is complex enough already. |
I edited the description to make it easier to know this depends on the other PR, and labeled this as "Don't Merge" for now. |
I went ahead and did a merge, but if you want to squash and rebase the simplest way is to fetch the latest git checkout recalculate
git reset bugfix-2.1.x
git add .
git commit -m "Speed up planner, fix judder, etc."
git push -f |
@thinkyhead I think you've missed the overflow concerns about |
Thanks for the heads up, @mh-dm. I sometimes forget the quirks of |
Remove forward_pass(). Call forward_pass_kernel() from recalculate() instead. Fix potential for large speed changes if planner falls behind.
Rebased/resynced and force pushed. As both #27013 and #27089 were merged - btw, thank you all for comments, reviews, contributions! - this PR now only includes the last commit which does this part:
I've slightly changed the way Note that I've left out removing the |
I'm taking care of lingering PRs today. Are you still feeling confident about this set of changes? |
Nice. Still confident as I've been testing/printing with the two main commits in this PR since my last update on May 17th. I haven't tested nor am I a fan (readability wise) of the "apply max like 26881" commit. |
I like to get I will go ahead and change it back, however, because in this case it makes it easier to combine the related PRs. |
Description
Significant speed up of planner by reducing
SQRT
s calls fromPlanner::recalculate()
by ~73%.Fix potential for significant judder due to a planner-stepper race / planner issue that's likely when there are many fast short segments.
(Edit: this part was split off into #27089) Ensure the existing
minimum_planner_speed_sqr
limit is correctly enforced and propagated tocalculate_trapezoid_for_block()
, which:MINIMAL_STEP_RATE
magic number / undocumented limit.(Edit: this part to be fully completed as part of #26881) Removing
MINIMAL_STEP_RATE
allows for the correct handling of moves with low acceleration.Removes
block_flags_t nominal_length
andPlanner::block_buffer_planned
.Removes
Planner::forward_pass()
and instead the forward pass is done as part ofPlanner::recalculate_trapezoids()
.Requirements
Applies to all motion.
Benefits
The main benefits are most obvious by looking at the steps with a logic analyzer similar to how it was done in #27013
Before
This is a capture with #27013 applied which already fixed other many acceleration spikes.
The first acceleration spike/speed discontinuity is due to a planner-stepper race / planner issue within
recalculate()
. The reverse pass grossly maximizes ablock->entry_speed
then the forward pass is supposed to come in and fix it so thatblock->entry_speed
can be reached within the previous block assuming full acceleration. However, if the previous block meanwhile becomes busy (the stepper takes it on) thenblock->entry_speed
is not fixed.block
is then recomputed and gets a really fast start.The condition for this to happen is that all the segments in the buffer are not enough to get to full speed, i.e. they're still actively planned. Which also means that this issue can happen even if the buffer is more than half full so it can happen even with
SLOWDOWN
enabled.It was tricky to reliably reproduce this issue and capture it with a logic analyzer so see Configurations.
The marked low speed move is supposed to have a very low acceleration (10 steps/s^2) but instead directly starts of at a rate of
MINIMAL_STEP_RATE
then does a weird accel and decel. It's supposed to be a simple full acceleration.The second acceleration spike/speed discontinuity is due to
minimum_planner_speed_sqr
limit being lost on the way tocalculate_trapezoid_for_block()
. Specifically the forward pass limitsblock->entry_speed
based on the previous block's acceleration and if that's low it will fall underminimum_planner_speed_sqr
. And if it does we will end up with the second step of a segment having an acceleration way past limits. The first step of a segment is always atintial_rate
. If it's too slow, it will result in too much accumulatedacceleration_time
(see stepper.cpp) and that means a very high calculated second step rate, and the difference is the acceleration spike.After
Planner-stepper race judder is fixed in
recalculate_trapezoids()
. If the previous block is busy we revert the forward passentry_speed
changes to (the newly added)block->min_entry_speed_sqr
. And yes I've ran it 10+ times to make sure.MINIMAL_STEP_RATE
is gone and now low acceleration moves are handled correctly.minimum_planner_speed_sqr
is correctly passed through and enforced through (the newly added)block->min_entry_speed_sqr
.Speeeeed
recalculate_trapezoids()
started atblock_buffer_tail
and for every single move it calculatedSQRT(next->entry_speed_sqr)
.So don't do that and instead only do
SQRT
for the entry speeds of blocks that actually need to be recalculated (there's some complexity to actually achieve this). Some before / after numbers for my test gcode and a more realistic gcode (modified section of a vase mode/spiralize print, 174 moves at 100mm/s):I manually tracked calls instead of just tracking the cycles or timer ticks because the stepper ISR jumps in often and resets the timers. With some math and handwaiving, the 73% reduction in SQRT calls (the most expensive single operation) means about ~1.5x faster planner, at least for the timings I get on my test platform LPC1769.
Configurations
How I got the planner-stepper race issue to semi-reliably reproduce (about 40% of runs):
and the gcode used in the capture
Related Issues
Depends on #27013
(edit) Needs to be reconciled with #26881
It seems like this change should fix #26274