Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comprehensive BLTouch implementation #13841

Closed
wants to merge 31 commits into from
Closed

Comprehensive BLTouch implementation #13841

wants to merge 31 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 28, 2019

My first pull request here to Marlin, so if I goofed somewhere call me names and clobber me, won't you?

BLTOUCH V3 introduced some new conecpts (5V mode vs. OD mode), made us all look at old commands (like SW mode) that we never really used and generally mixed us up when the early V3's were flaky on certain machines.

This prompted my to start systematically charting their behaviour, a process that was ongoing and also due to time constraints, fraught with errors - always two steps forward and one step backwards. Thanks to @InsanityAutomation who brought in a quick perfectly functioning fix and actually, apart from some small concerns or further ideas (like SW-mode for M119), nothing really needs to be done.

Still I had a hankering for going a bit further and changing a few things:

The STOW and DEPLOY actions are now checked for success AFTER they are executed. Why? Checking for an ALARM condition BEFORE an ACTION removes the ALARM temporally from its cause, making debugging of problems more difficult. There is really no need to check BEFORE the action: Because the BLTouch will not execute the action anyway if an ALARM is present, and the ALARM will still be present after the action, which will then be seen as failed.

Initialization at printer start is the time to set 5V or not, a single time. It stays active until the next power-cycle or when manually setting OD mode in the LCD BLTouch configuration.

SW mode if really desired is issued after a deploy - the user must explicitly request that to be active and it won't survive the next action cycle.

When errors are detected, different things need to be done for a failed DEPLOY and a failed STOW. The main point is that the attempt to "get rid" of the error can fail - and that must be detected just like the original failure was.

If the debugging messages are enabled, all servo command angles are reported, as well as the progress of DEPLOYs and STOWs, their initial failures and the failure of the recovery attempt (if such were to happen).

Looking at the newest BLTouch documentation shows some appalling details (like needing a 2000ms delay after setting 5V mode, WTH?, needing more than the 375ms Marlin default for a RESET (+100ms ???) and sundry other little details).

It remains to be reported to ANTClabs that the behaviour of the probe is sometimes "strange", like when you try to clear a trigger in SW mode by using RESET, but that only becomes visible after the next action, or when you expect RESET to STOW the probe (as specified in their manual) but this doesn't happen in all cases. As a matter of fact, one can expect the BLTouch probe to go into a non-recoverable state (RESET doesn't help) if it gets severely waylaid by noise. The code will then detect that failure, but you still need a power cycle to get over it. We can only hope that ANTClabs will address some of these points but the code will make it easier to document what is happening.

The BLTouch section of the configuration has been adapted to reflect the different kinds of probes out there and gives the user full control of the probes configuration. The example configs should be "laboriously" adapted to fit to each boards characteristics. Hopefully users will contribute.

Lastly, my experience has shown me that the latest V3 can be run in a nicely insane high-speed mode given a good cabling and noise situation. So why not allow the user to try that?

I tried to document heavily in the code and hope that this way, a lively discussion will take place on what I might have overlooked or done wrong.

The code is running here on one 32bit and one 8bit machine, and runs on a clone probe, a V2.0, an early V3 and a current V3.

I changed bltouch.cpp and bltouch.h, obviously, not only the logic as stated above, but also some naming and general tidying up after such a hectic time over the lase few months.

I changed the BLTouch sections in the configuration.h files and tried to make them self-documenting.

I modified the lcd menu configuration section for BLTouch - setting SW mode should be possible also on non-V3 probes, I tested it and it even works on the clone.

I modified G28.cpp and G34_M422.cpp to just STOW (for safety) before a HOME - instead of the lengthier call to init() or reset(). Especially as that would contain a 2s wait if the 5V setting is in there (WTH ANTClabs?)

I changed one of the menu texts in the english language file. All other available languages are in different states of repair, some of them actually better then the english one. Not a problem for now.

There is a small video of M48 in the fast mode on youtube under M48

BLTouch record 20190428
Stricter adherence to the probes characteristics, better error correction and debugging.
A change needed just like in G28.cpp - no more reset, just stow
Change the entire timing logic to ensure ANTClabs timing restrictions and to react as fast as possible to ALARM states of the probe
@ghost
Copy link
Author

ghost commented Apr 29, 2019

I have cleaned up the code some more by changing the logic for DELAY times after BLTouch command. BLTOUCH_DELAY, if larger than the delay times empirically determined by testing, will take precedence, thus the effective delay time values will NOT go below these times.
Also, when delaying after a command - the delay will now terminate IMMEDIATELY if an ALARM ( = triggered ) condition is detected. This speeds up detection of a physical probe failure. I have tested this by blocking the probe and the code reacts slighty faster.
I've got some queries open at ANTClabs for a few more details. I would like to know for how long minimum (according to them) a PWM angular signal needs to be present before it is recognized and then I would like to know minimum execution times for ALL commands. We'll see what we get...

Here is a screenshot and a video ( G28-M48-G28_blocked ) of a G28 (home) followed by a M48 where I blocked the needle after the first probe, then a G28:

grafik

deploy_wrapper and stow_wrapper instead of deploy_stow_wrapper. No common code anymore anyway.
50ms increment instead of 100ms in the delay routine, SET5V and SETOD delay updated until there is concrete information from ANTClabs on this topic
@Roxy-3D
Copy link
Member

Roxy-3D commented Apr 30, 2019

I will have a chance to load this today. It would be good if we can get somebody else to check it out also.

@InsanityAutomation
Copy link
Contributor

I probably wont get to it today, but in the next couple days I should have time to play with it.

ms = BLTOUCH_DELAY;
}
// Don't wait longer than necessary if an ALARM pops up. Check trigger
// in increments of 50ms to catch the ALARM right away.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pulse width of a trigger is 10ms. There is a good chance this will be picked up by this at some point. Should have a flag for triggered seen and only fault out of true twice in a row. Also, this wont truly wait safe delay, It may be off by up to 49ms. Could do while(ms > 0) then safe_delay MIN(ms, 50));

Copy link
Author

@ghost ghost Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this wont truly wait safe delay, It may be off by up to 49ms. Could do while(ms > 0) then safe_delay MIN(ms, 50));

safe_delay():

void safe_delay(millis_t ms) {
  while (ms > 50) {
    ms -= 50;
    delay(50);
    thermalManager.manage_heater();
  }
  delay(ms);
  thermalManager.manage_heater(); // This keeps us safe if too many small safe_delay() calls are made
}

Why might this be off by 49ms? It does a possible rest value outside the loop at the end?
Apart from the timing overhead of thermalManger.manage_heater() it looks reasonably precise.

Copy link
Author

@ghost ghost Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a good chance this will be picked up by this at some point.

I though that unless something is very wrong, there should never be a trigger while in this loop. If this is a wait after a DEPLOY, movement only begins after the wait is over. Similar logic applies to the other commands.

Should have a flag for triggered seen and only fault out of true twice in a row.

But that is a really good idea to not get out of the loop when there is noise.

And you are right: In no case do we want to get into a situation where we might issue the next command before the wait for the previous command is over.

I think it would be best to scrap this exiting of the loop altogether. I kind of though it would speed up things - but we are talking about reducing (for example) a 750ms STOW wait to 350ms, who cares actually. How often do we probe, anyway.

So I will remove the two lines and let's let the wait do its thing without any pre-exit.

What do think of putting thermalManager.manage_heater(); in there and not calling safe_delay but just delay? It would mean an include for temperature.h at the top though. In the end I was against it, but didn't evaluate code-bytes against the two versions.

Copy link
Author

@ghost ghost Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do while(ms > 0) then safe_delay MIN(ms, 50));

Yes very nice idea for the original safe_delay by the way, but how does the code bytes overhead of MIN compare to just doing the rest after the end of the loop?

Original:

void safe_delay(millis_t ms) {
  while (ms > 50) {
    ms -= 50;
    delay(50);
    thermalManager.manage_heater();
  }
  delay(ms);
  thermalManager.manage_heater(); // This keeps us safe if too many small safe_delay() calls are made
}

New idea:

void safe_delay(millis_t ms) {
  while (ms > 50) {
    ms -= 50;
    delay MIN(ms, 50))
    thermalManager.manage_heater();
  }
}

Looks shorter but how expensive is MIN (its a a couple of branches and an immediate compare, I think) vs. two call instructions. Could be smaller, if you want I will compile and compare the OBJ code (8 bit and 32 bit, of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'new idea' will break things... If safe_delay() is called with a small amount of time... say 25ms, no delay will be performed.

Copy link
Member

@Roxy-3D Roxy-3D May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! That code is super easy to KNOW the right thing is going to happen. I much prefer that form. And incidentally... That is more 'optimum' too. That will generate less machine language code than what is there right now.

Copy link
Author

@ghost ghost May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked:

The MIN is expensive.

The "new idea" looks better in the c code. As I feared: It is longer on AVR machine code. It is slightly longer on 32 bit code. The original code knew what he was doing.

Do we care about 10-20 bytes of machine code in this specific case? In general, we do.

Copy link
Member

@Roxy-3D Roxy-3D May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 or 20 bytes isn't a big deal to make sure the problem is fixed... With regard to MIN() being expensive... In this case we know the data types going into the macro. If we side step the MIN() macro and just use a ternary operator, I would think this ends up as small as possible. The big problem is the type millis is a 4 byte long. And those comparisons and arithmetic are not 'natural' on the 8-bit AVR's.

void safe_delay(millis_t ms) {
  while (ms > 0) {
    delay( ms<50? ms : 50);
    ms -= 50;
    thermalManager.manage_heater();
  }
}

Copy link
Author

@ghost ghost May 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. The original coder knew what he was doing. ms<50? ms : 50 in machine code is loads for the ms var, multiple compares and 2 branches. The original one is shorter in machine code, believe me. But less shorter on 32 bit machines. AVR - it's hell.

If I were a CPU I would go berserk anyway, first the guy is asking while (ms > 50) god knows how many times, and then he wants to know ms < 50 that many times. Bonkers, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make sure the problem is fixed

But there is actually no problem


if (DEBUGGING(LEVELING)) DEBUG_ECHOLNPAIR("bltouch.deploy_wrapper()");

return false; // report success to caller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be reversed. True for command successful, and false for command failed. Thats the typical convention.

Copy link
Author

@ghost ghost Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be my preference as well. It would be easy to inverse the logic at all the very few occurences where these wrappers are called. Note that I have seen a few other places where "false" is "success" in Marlin, so I believe that the original coder of bltouch probably thought that he should adhere to it. I just stuck to that logic. Look at the original bltouch.cpp, that's where I took it from.

The callers typically look like this:

if (triggered() & wrapper(...)) -> error exit
... success code...

So it needs a true to error out.

Saves quite some machine instructions (8-bit code size) was probably the reasoning.

Oherwise it would have been:

if (triggered() & !wrapper(...)) -> error exit
... success code...

One instruction more at least.

Other constructs using if then else are also possible but use even more instructions in otherwise straight line code.

Sooner or later one should identify these places and gradually turn the logic around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for command successful, and false for command failed. Thats the typical convention.

In fact, a very common convention is zero for no error, or non-zero for an error code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write the original, @thinkyhead
Seen so many places in Marlin where this is "the wrong way around" - correcting them all would give AVR maybe 100 bytes more program code. 32 bit who cares, but AVR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, one out of many (probe.cpp):
grafik
It's all over the place. So lecture the original author. But like I said, he liked coding it that way...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another one in motion.cpp:
grafik
I just wanted to conform to the prevalent coding style so as not to break anything.

It's safer to really wait the needed delay than to risk premature ext due to noise or other logic errors.
It's safer to really wait the needed delay than to risk premature ext due to noise or other logic errors.
@CRCinAU
Copy link
Contributor

CRCinAU commented May 1, 2019

For what its worth, I have tested this with #define BLTOUCH_DEBUG_MSGS set. I was getting quite a few random alarms on stock marlin-2.0.x - however I could not reproduce an alarm state.

I'm now testing again with #define BLTOUCH_DEBUG_MSGS disabled to confirm that it isn't a coincidence with timing caused by the debug messages.

Current indications are looking good though.

@CRCinAU
Copy link
Contributor

CRCinAU commented May 1, 2019

From talking with @FanDjango, I've tested this quite a bit. I've now had no failures in several hundred probes.

In consultation, we also decided to wind the slug ~30 degrees clockwise. That plus the new code has been perfect in my observations on a BLTouch v3...

@ghost
Copy link
Author

ghost commented May 2, 2019

Stage 3 of the recode:

The current integration of the BLTouch in probe.cpp is less than ideal. It had been patched into a fairly low level and thus did not take advantage of some of the higher level logic already in place, for example: "DONT DEPLOY IF PROBE IS ALREADY DEPLOYED". Same for STOW.

Once you spend some time with the BLTouch, you realise that doing the following sequence when probing,

DEPLOY, probe, move up, STOW, move XY, DEPLOY, probe etc.

is sub-optimal.

Once deployed, the BLTouch will happily keep on probing and triggering until you STOW.

So you can do better:

DEPLOY, probe, move up, move XY, probe, move up, move XY, etc. and then do a final STOW.

One of the reasons that the BLTouch was not fully integrated into the already there features of probe.cpp (see also ALLEN KEY in there), is that a vital feature to make things a bit easier was seen to be missing: A way to query the probe as to its deployed/stowed status. If you can do that somehow, the error handling in probe.cpp also will take care of the BLTouch.

But, there is a way to do this. The following small routine illustrates the method:

bool BLTouch::status_wrapper() {
  // return a TRUE for "YES, it is DEPLOYED"
  bool tr;
  #if ENABLED(BLTOUCH_DEBUG_MSGS)
    SERIAL_ECHOLN("BLTouch STATUS requested");
  #endif
  _set_SW_mode();
  tr = triggered();                 // If triggered in SW mode, the pin is up, it is STOWED
  #if ENABLED(BLTOUCH_DEBUG_MSGS)
    if (tr) SERIAL_ECHOLN("BLTouch is STOWED"); else SERIAL_ECHOLN("BLTouch is DEPLOYED");
  #endif
  _reset();                         // turn off the SW Mode
  if (tr) _stow(); else _deploy();  // and reset any triggered signal, restore state
  return !tr;
}

It uses SW mode to get the current status and then restores the state. Works fine on all probes that support the SW mode, which even most all clones do, just some original old BLTouches do not.

For those, instead of querying the status, we will simply command the status we desire to see when querying.

As can be expected this method makes initial setup of a probe sequence (M48, G28, G29) take a moment longer. But then the express train gets moving. Oh, and setting MULTIPLE_PROBING to 2 or three or more is now fun, too.

Here are three videos. G28, G29, M48

I'll do some more testing and expanding of this and put it up now for others to have a look at. This will probably be the final stage of a BLTouch recode, as I can see no further possible optimisations.

If the feedback is positive, I will update all the example configuration.h files.

See the description in the pull request.
Final optimisation of BLTouch probing sequence.
Fixed for now, needs to be investigated
@Roxy-3D
Copy link
Member

Roxy-3D commented May 2, 2019

My first pull request here to Marlin...

Your doing well on all fronts. The code. The collaboration. The whole picture.

@ghost
Copy link
Author

ghost commented May 3, 2019

Stage 3 Testing:

Thanks to @CRCinAU , who stress-tested this on his ENDER 3, there are some indepedant experiences of the new probing-sequence.

He has posted a video of a G28 followed by a G29. Ender 3 G28-G29

In a nutshell, you can see that on printers with a vertical Z-feedrate of less than about 10mm/sec, the upward motion would not be fast enough to avoid the dropping pin (ready for the next probe) to cause an error.

It only works if printers have a Z-feedrate capable of >10mm/sec.

That is NOT the case for many printers.

I then thought of making this configurable, like BLTOUCH_ENABLE_FAST_SEQUENCE or something like that. But thinking of @InsanityAutomation and the others here kind of tells me, the support effort for something that gives so little gain and would become so complicated, is not worth it.

Too much pain for too little gain.

Thus I am reverting the commits for stage 3. The previous was a good level that had been reached and one can live with it very well.

It was interesting though, to take the BLTouch to its limits and learn all about its functions and quirks.

( Leaving the status query routine in there though, because someone might want to use it in the future and while it is not used, the compiler will optimize it away),

This reverts commit e377dcf.
@CRCinAU
Copy link
Contributor

CRCinAU commented May 3, 2019

I'm in two minds. 99.9% of the time, I get through the probe fine on the Ender 3. In fact, in all my testing, I've only had it error out once.

I can see if any printers have a slower Z movement than the Ender 3, it might be a problem - however it may well be that this can have a single define AFTER the BLTouch has triggered the ZStop that tells it to STOW - then maybe an 'if stowed, deploy' bit of logic before starting the Z movement down for the next probe?

What would that mean in the way of probing speed?

I feel it wouldn't be any slower than the current - and only really need a couple of extra code lines to enable both methods...

In case ever needed, here is a query routine for the BLTouch pin
And some small cosmetic changes
@ghost
Copy link
Author

ghost commented May 3, 2019

Hi @CRCinAU Yeah, your idea is good and I realised: If you DO do a STOW to solve that problem, then you also need a DEPLOY to get going again, and then we are back to the original sequence. So much for that idea.

@CRCinAU
Copy link
Contributor

CRCinAU commented May 3, 2019

Hi @CRCinAU Yeah, your idea is good and I realised: If you DO do a STOW to solve that problem, then you also need a DEPLOY to get going again, and then we are back to the original sequence. So much for that idea.

But it also allows a much faster method for those who can 'beat the probe' - which i think is still a step forwards.

@ghost
Copy link
Author

ghost commented May 3, 2019

I also found out that unless you are probing 5x5 or 7x7, you are not gaining so much speed due to the initial setup delay
And how often in your life do you do a M48?
Even Homing, (which I had not implemented yet) with one slow and one fast probe isn't worth the hassle.

@ghost
Copy link
Author

ghost commented May 3, 2019

Yes, I am one of the "friends of G29 in CURA START-GCODE". But honestly, my Bed heats before G29 and that takes longer. So my gain is by percentage even less

@CRCinAU
Copy link
Contributor

CRCinAU commented May 3, 2019

I also found out that unless you are probing 5x5 or 7x7, you are not gaining so much speed due to the initial setup delay

True - the speed increase for me was that I can do a G28 + G29 in the same speed as a just a G29 before. In my mind, that's still a good decrease. I do probe a 5x5 grid.

And how often in your life do you do a M48?

Lately, too often! But I get your point.

Even Homing, (which I had not implemented yet) with one slow and one fast probe isn't worth the hassle.

yeah - I have to do slow == fast with respect to speed settings. The M48 seems to show that there isn't any difference measured in accuracy between having a second, slower speed - or probing twice at the same time. That being the case, I'm not sure of the benefit of two probes for homing - unless its to cross check?

EDIT: That being said, a printer with faster Z movement than the Ender 3 might notice differences...

@ghost
Copy link
Author

ghost commented May 3, 2019

Well HOMING is coded really smart: If you are high up, it will come down fast, do the first approach to more or less get a feel for where the bed is, raise bit, and do a slow approach for a precise measurement. If you didn't do this, you would have to come in real slow from what might be a great height, not knowing when it would touch, and that can take ages.

@CRCinAU
Copy link
Contributor

CRCinAU commented May 3, 2019

I guess the Ender 3 isn't fast enough on the Z axis to really notice ;)

@ghost
Copy link
Author

ghost commented May 3, 2019

Yes, but that also means it's very precise on the Z-Axis, it seems. I will have the same problem when I migrate to a M3 DIN screw rod instead of the one I have now. You see it in the steps/mm config.

@ghost
Copy link
Author

ghost commented May 3, 2019

What I would have really liked was for the BLTouch V3, with all its NEW commands, would have added ONE MORE NEW command.

Example: BLTOUCH_SLOW_MODE at 40°
When triggered, pull up the pin, keep it up for 1sec ( = 10mm travel on the ENDER3 ) and then drop it again (instead of dropping it immediately after pulling it up). ANTCLABS!!!! <-- please read this.

Trouble is, then you would have to configure: Do I have a new one that supports this?

Oh, you just brought on an idea: With the status routine method, one could find out this information automatically by timing the drop, I'll keep that in mind for the future.

@ghost ghost closed this May 5, 2019
@CRCinAU
Copy link
Contributor

CRCinAU commented May 5, 2019

To be honest, the majority of this code should be accepted.

The stock Marlin bugfix-2.0.x usually fails on a 5x5 G29 for me - at least 80% of the time.

This code works perfectly.

Further to this, the branch 'bugfix-2.0.x-BLTOUCH-NERD-RELEASE' is even more reliable. I believe this should be top priority with the number of problems on the BLTouch v3 out in the wild.

@ghost
Copy link
Author

ghost commented May 5, 2019

@InsanityAutomation is working on this now, as he told me. I have completed my investigations and have communicated the results.

@thinkyhead thinkyhead changed the title Bugfix 2.0 x special bltouch release for review Comprehensive BLTouch implementation May 6, 2019
@ghost
Copy link
Author

ghost commented May 10, 2019

This PR ( closed and retitled by @thinkyhead ) has been superseded by #13814, which has been merged.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants