Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide feedback to hosts when busy #3109

Merged
merged 2 commits into from
Mar 19, 2016

Conversation

thinkyhead
Copy link
Member

This PR offers one approach to keep hosts alive, as discussed in #3068

Some things about the Marlin command processor:

  • The command queue gets commands from three sources: Serial (USB), SD Card, and injected commands.
  • The command queue processor handles only a single command at a time, so any command can indefinitely block processing of the next command.
  • A common way to block the command queue is while (!lcd_clicked()) idle();, for example. In fact, anything that blocks for any period of time is already calling idle().

Bad news for hosts:

  • While a command like G29 or M303 is blocking, host software will simply find Marlin unresponsive.
  • Host software has no idea why Marlin is blocking and may just decide it has disconnected.

In this solution, whenever Marlin is inside of a command processor, the idle() function will keep an additional timer. Every time the timer reaches 2 seconds, a message will be sent to inform the host that Marlin is still alive (but busy, waiting for user input, etc.).

Any command handler that calls idle() directly or indirectly will invoke the keepalive message if it blocks for more than 2 seconds. So that includes: G4 dwell, G29 auto probing, M0 wait for user, M109 and M190 waiting, M226 wait for pin, anything that calls st_synchronize to wait for moves to finish (M400 or M48 for example), and anything that calls plan_buffer_line with a full buffer.

@thinkyhead thinkyhead added PR: Improvement Needs: More Data We need more data in order to proceed PR: New Feature C: Hosts & Protocols T: Design Concept Technical ideas about ways and methods. S: Don't Merge Work in progress or under discussion. labels Mar 8, 2016
@@ -129,6 +129,9 @@
#define MSG_COUNT_A " Count A: "
#define MSG_ERR_KILLED "Printer halted. kill() called!"
#define MSG_ERR_STOPPED "Printer stopped due to errors. Fix the error and use M999 to restart. (Temperature is reset. Set it after restarting)"
#define MSG_BUSY_SIGNAL "Busy"
#define MSG_WAITING_FOR_USER "Paused for user"
#define MSG_WAITING_FOR_INPUT "Paused for input"
#define MSG_RESEND "Resend: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a "Busy: " prefix could simplify parsing for the host - it had to check for only one expression.
Anything containing a "...wait..." is not a good idea - already used.

@thinkyhead thinkyhead force-pushed the rc_host_keepalive branch 4 times, most recently from fa36983 to 0bb6fd8 Compare March 8, 2016 08:23
@@ -129,6 +129,9 @@
#define MSG_COUNT_A " Count A: "
#define MSG_ERR_KILLED "Printer halted. kill() called!"
#define MSG_ERR_STOPPED "Printer stopped due to errors. Fix the error and use M999 to restart. (Temperature is reset. Set it after restarting)"
#define MSG_BUSY_SIGNAL "Busy: Processing"
#define MSG_WAITING_FOR_USER "Busy: Paused for user"
#define MSG_WAITING_FOR_INPUT "Busy: Paused for input"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Blue-Marlin something more like this?

@@ -646,6 +646,8 @@ const bool Z_MIN_PROBE_ENDSTOP_INVERTING = false; // set to true to invert the l
#define EEPROM_CHITCHAT // Please keep turned on if you can.
#endif

//#define HOST_KEEPALIVE_FEATURE // Send messages to the host when busy with long operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this changes behavior but I believe it shall be active by default, this is one of those "black magic" flags no one will understand the what it exactly does and the Octoprint guys will continue to see issues coming their way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until there is at least one host supporting this, it makes no sense to activate it by default.
And then we have to find out, if this confuses any other hosts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified behavior for hosts is that if they can't interpret a message they should print it to console and ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OctoPrint automatically supports this since OctoPrint interprets any output from the printer as "oh, look, it's still alive!" (and has been doing so for quite some time now). So another 👍 for having it enabled by default, it will instantly solve a lot of issues OctoPrint users are currently experiencing with long running commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well.
In that case let's go for it.
But we need at least a week for tests after merging to RCBugFix, before we can tag RC4.
The nozzle-tune disaster still hurts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Wackerbarth would be around, he'd complain about: "Don't enable anything by default."
How about: //#define DISABLE_HOST_KEEPALIVE_FEATURE ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems sane.
But I would suggest a bit shorter: DISABLE_HOST_KEEPALIVE

@jbrazio jbrazio mentioned this pull request Mar 9, 2016
@thinkyhead
Copy link
Member Author

Sorry I haven't commented on this more – I thought it would be a while before we had consensus to go forward. I will update this with your suggestions and merge it for the community to digest. I realized this feature will also be important to deal with any (rare, but technically possible) time-consuming immediate commands that could be issued through enqueuecommands_P. Currently this is not an issue, but there are some configurable options that take GCode, and nothing prevents using M0 if it suits the purpose.

@thinkyhead thinkyhead force-pushed the rc_host_keepalive branch 2 times, most recently from 7a80bf9 to abe95dc Compare March 18, 2016 23:22
@Blue-Marlin
Copy link
Contributor

Would not be calling idle() with a (default) parameter, passed to manage_inactivity() the better solution than a idle2()?

@thinkyhead
Copy link
Member Author

Would not be calling idle() with a (default) parameter, passed to manage_inactivity() the better solution than a idle2()?

It may be. I moved the code that is now in idle2 out of the location that calls idle2 as a callout to that original more-or-less duplication. So, as you point out: there are nicer ways to do it and this is not a place where saving two cpu cycles is a major point. So, I will apply that fixup forthwith.

@thinkyhead
Copy link
Member Author

Should I define idle the old way when not using FILAMENTCHANGEENABLE and save some cycles? Hmm…

@thinkyhead thinkyhead force-pushed the rc_host_keepalive branch 2 times, most recently from 8d89093 to 8670fa5 Compare March 19, 2016 05:35
If Marlin is blocking the serial input or command queue for any length
of time (for example more than 2 seconds), it needs to send a message
to serial out to inform the host that it is busy. Marlin should only
send these messages out when busy, and preferably not when trying to
print formatted output.
@thinkyhead
Copy link
Member Author

I kind of want to use the name PAUSED_FOR_ANYKEY but that's getting too geeky.

@repetier
Copy link

repetier commented Apr 6, 2016

For the written log it will most probably not work as it is written when we receive it. For the viewing side on runtime it should be quite simple to switch to repetition count instead. That is a good idea.

Regarding repetition time I vote for 2s as default and and an option to make it worse if user does not believe it is good, also he is wrong. There should also be a method to get the timeout so hosts can ask for correct value.

@thinkyhead
Copy link
Member Author

Setting the host timeout to 2s tends to produce a lot of "busy" output in everyday circumstances.

Anyway, I'm curious. Currently it is quite easy to block the serial input queue for a very long time. For example, just issue a few G2 or G3 commands with very large circles. So does the following GCode now cause hosts to time out and assume the printer has gone away?:

G28
G1 Z10 F9000
G1 X10 Y10 F5000
G2 I90 J0 F1000
G3 I90 J0 F1000
G2 I90 J0 F1000
G3 I90 J0 F1000
G2 I90 J0 F1000
G3 I90 J0 F1000

@thinkyhead
Copy link
Member Author

Thanks for all the good feedback. I've been focused on other sections but will deal with this today. As I patch it up, I will set 2s as the default timeout, but make it configurable through GCode, and I suppose it will be saved to the EEPROM also.

@thinkyhead
Copy link
Member Author

#3414 applies the necessary changes to give a configurable interval with a 2s default. The code M113 Sn is used to set the interval, while M113 with no argument will reply with the current interval. Currently it just outputs echo:M113 S2, for example.

@repetier
Copy link

repetier commented Apr 7, 2016

The circles are a good example for what I explained earlier. This must also send busy signal when it is blocking input too long, otherwise it makes no sense.

Our hosts will not assume printer is gone, but would maybe timeout and then send next command. When printer is ready again it will then encounter checksum error (buffer overflow so new data is partially ignored) and request a resend. So that is a problem but normally gets solved by other error checking routines.

@jbrazio
Copy link
Contributor

jbrazio commented Apr 7, 2016

This must also send busy signal when it is blocking input too long

So the host knows it just sent to the robot a blocking instruction to draw a circle or do a long slow move or whatever.. but he needs confirmation from the fw that indeed the robot is "busy" executing what has been told.. every 2s.

Is it just me that finds this inelegant "brute force" ?

@repetier
Copy link

repetier commented Apr 7, 2016

G0/G1/G2/G3 are not registered as long running and the host does not know when they cause a block. They might be buffered and return "ok" directly and 10 commands later a simple short G0 takes 40 seconds because the move buffer is full and the slow g-move is now executing. It is not up to the host to know why and when firmware is blocking. Firmware has just to say that it is in blocking mode.

@foosel
Copy link
Contributor

foosel commented Apr 7, 2016

Blocking? G0 through G3 are buffered commands and only block if the - from the point of view of the host arbitrarily sized - planning buffer of the firmware is full. Which we can't know beforehand since stuff happens asynchronously on your side.

@thinkyhead
Copy link
Member Author

Basically, we can think of any lengthy command as potentially blocking the serial input. After all, the default serial command buffer only requires 4 "unfinished" commands to become filled up and blocked.

The movement commands are not immediately blocking in the way that something like M303 is. You can continue to send G0-G3 without blocking so long as the planner buffer doesn't get filled up. Almost any G2/G3 arc will saturate the planner buffer until it's done. And on Delta/SCARA machines, every move is interpolated, so the planner buffer is kept full by most G1 commands too.

Under typical usage conditions and typical GCode files, with a 2 second interval we see "busy" messages constantly. The reason I leapt to 10 seconds was due to the sudden influx of complaints. But at that time there was no option to turn it on and off nor change the interval. Now that there's an M113 command, maybe it makes sense to leave it "off" by default (same as setting DEFAULT_KEEPALIVE_INTERVAL to zero) and leave it up to hosts to use M113 to turn it on and off themselves, choosing their own interval.

@jbrazio
Copy link
Contributor

jbrazio commented Apr 8, 2016

M113 seems a good middle ground commitment for a starting point that is able to accommodate both parties.

@repetier
Copy link

repetier commented Apr 8, 2016

I haven't looked into your implementation but I would not expect busy on normal gcode. From time to time yes, but most moves are shorter than 2s so waiting time is less then 2s and every ok should reset busy timer. So if you check every 0.1s if a busy needs to be send it is ok to send it when blocking exceeds 2s. Hosts will add a safety margin to your value anyway I think (maybe 1s or 0.5s) to compensate round trip times and inaccuracies.

@thinkyhead
Copy link
Member Author

@jbrazio We shouldn't set too conservative a value, apparently. When I did my 10 second "suggestion" merge I received flak, ridicule, and insult for my efforts. So, as long as we're going for "middle ground" changes I think the best option is to set the default to about 5 seconds. I agree it still makes sense to leave it on by default, because it will still keep hosts thinking Marlin is alive, even those that don't yet have any support for M113.

@thinkyhead
Copy link
Member Author

I haven't looked into your implementation

Well, here it is in all its glory. Very simple! The busy_state is set to IN_HANDLER just before any command is processed, and PAUSED_FOR_USER in the appropriate circumstances. The busy_state is reset to NOT_BUSY after a command is handled (also in M303 to suppress busy messages). This function is then called from idle() (which is invoked even when a command is blocking):

void host_keepalive() {
  millis_t ms = millis();
  if (busy_state != NOT_BUSY) {
    if (ms < next_busy_signal_ms) return;
    switch (busy_state) {
      case IN_HANDLER:
      case IN_PROCESS:
        SERIAL_ECHO_START;
        SERIAL_ECHOLNPGM(MSG_BUSY_PROCESSING);
        break;
      case PAUSED_FOR_USER:
        SERIAL_ECHO_START;
        SERIAL_ECHOLNPGM(MSG_BUSY_PAUSED_FOR_USER);
        break;
      case PAUSED_FOR_INPUT:
        SERIAL_ECHO_START;
        SERIAL_ECHOLNPGM(MSG_BUSY_PAUSED_FOR_INPUT);
        break;
      default:
        break;
    }
  }
  next_busy_signal_ms = host_keepalive_interval ? ms + 1000UL * host_keepalive_interval : -1;
}

(The -1 can be interpreted as 0xFFFFFFFF with the unsigned next_busy_signal.)

@repetier
Copy link

repetier commented Apr 9, 2016

@thinkyhead Good concept with one little flaw. It will break after 49 days when the timer overflows. To prevent this make next_busy_signal_ms last_busy_signal_ms = ms; at the end and for the timer detection write ms - last_busy_signal_ms > 1000UL * host_keepalive_interval
That would solve the overflow problem. next_busy_signal_ms = 0 should cause a return at the beginning then. last_busy_signal_ms can be initialized with 0 as timer starts with that. I think I will copy the concept with adopted style.

@thinkyhead
Copy link
Member Author

@repetier Based on your suggestion, it sounds like we need to change every instance where Marlin uses a next_event_time variable instead of a previous_event_time variable. But perhaps it really isn't that important for Marlin to support a machine being powered on continuously for 1,200 hours…?

Anyway, the only reason we use next_event_time variables is to save a little CPU – one fewer addition (or addition and multiplication) per test.

So… #3443 should do the trick!

@repetier
Copy link

You can come away with less code change if you use next time assuming next time < 20days with
if(nextTime-currentTime < 2000000000)
Here you use the overflow in a unsigned integer to detect it and this also works at the overflow position. But as you see, there is no way around a additional minus to catch the overflow case.

I have a busy implementation in repetier-firmware now and am testing it with the next repetier-server release. Last print had only some busy at the beginning from heating/homing. The rest of the print had no busy message so that is good.

If you have log open with a host without busy support you see of course the messages. For server I have already added busy messages as ACK so they get filtered. I think it is only a question of time until all hosts will filter it as ACK to not flood the user. It is just the problem of a new feature to get a well known standard.

@repetier
Copy link

Ok, after some struggling I got your code also to work with next server release. Main problem is that you forgot

  case 113:
    gcode_M113();
    break;

in the M code selection switch or it was removed meanwhile.

Now it is printing with 3s timeout auto configured from M113 and no timeout during heating or slow moves. I did not test the other commands but the principle is proven to work. Repetier-Server 0.75.0 will handle it and Repetier-Host 1.6.2 should also include the correct handling. And with Marlin and Repetier-Firmware using it I think other hosts will soon follow as well.

@thinkyhead
Copy link
Member Author

@repetier Aha, did I forget something there? Well I'm glad to hear it's all holding together. Are there any other fine points to go over here? If not, I suppose we can "close" this very productive thread!

@repetier
Copy link

I haven't and can't test any slow command but the cases tested and common worked fine. So I guess we can close it.

@lrpirlet
Copy link
Contributor

OK, @thinkyhead, nice feature...
Could you or someone tell me the relation with the wait message sent every second or so when NO_TIMEOUTS is defined??

If I understand correctly, wait is when Marlin wants more data so it says "I wait for some to execute" , while when working a command such as M48 that could take too long a time, Marlin says "I am busy, hang-on you will get an OK pretty soon..."

Does that mean that we can enable both???
Does that mean that we should enable both???

@thinkyhead
Copy link
Member Author

@lrpirlet The NO_TIMEOUTS option only sends a wait message whenever the command buffer is empty for a period of time. It's meant to "wake up" a host that might have forgotten the printer was waiting. It's totally compatible (and complimentary) with the HOST_KEEPALIVE_FEATURE.

  • NO_TIMEOUTS: Marlin is expecting more data from the host now.
  • HOST_KEEPALIVE_FEATURE: Marlin can't accept data from the host now.

@repetier
Copy link

You should enable both.
wait - firmware is idle and willing to accept more data. This indicates empty buffer to hosts who can send more data if they thought there should come a ok that they missed.
busy - firmware is not idle and can not accept new commands. This is to prevent host from timing out and assuming falsly that a ok was missed.

@jbrazio
Copy link
Contributor

jbrazio commented Apr 12, 2016

I wish we were solving this problem without using a shovel and pickaxe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Hosts & Protocols Needs: More Data We need more data in order to proceed PR: Improvement PR: New Feature S: Don't Merge Work in progress or under discussion. T: Design Concept Technical ideas about ways and methods.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants