-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completely Rewritten Insteon Message Parser #388
Conversation
Changes the PLM parser to a First-In-First-Out style. This should resolve the issue found in hollie#374 where shorter messages were being excerpted out of the middle of larger messages. I believe the routine is significantly easier to read. Removed the use of the split routine which created really odd results, and was not the most logical code to follow. This also likely solves other latent issues that may not have been apparent.
Conflicts: lib/Insteon_PLM.pm
@krkeegan, will do. Looks good a first blush, but will give it some time. |
I was doing some experiments that required restarting MH multiple times and managed to induce a failure when my user code fired at the top of the minute at 13:07:00 while the Insteon devices were still synching with MH:
|
I believe the problem happens because of:
This has been an ongoing nagging issue. While "skipping" the message sounds benign, it has dramatic consequences because neither the success nor failure_callbacks are called for this message. As a result, the message queue for this device gets stuck in a state where it is waiting for something to happen that never does. I have been looking into patches to solve this such as calling the success or failure callbacks when skipping a message. But each of these patches would potentially cause ripple effects that are hard to anticipate. Currently, I am thinking that the best solution is to no longer skip duplicate messages. They rarely happen, and when they do they are most likely intentional. I will keep the warning in the log, but allow the duplicate message to be processed. This should solve this issue and hopefully we won't get stuck in some infinite loop. |
OK, hopefully that most recent commit should do it. |
Hi @krkeegan, that seems to have done it.
There is one apparent artifact remaining at 15:37:51, but it appears to be benign. |
Hi Kevin, On 03/27/2014 02:34 PM, Kevin Robert Keegan wrote: [...]
Is this related to duplicated messages received from a device reporting I suspect you are talking about something deeper in the stack, but I Cheers, Eloy Paris.- |
On Thu, Mar 27, 2014 at 3:53 PM, Eloy Paris notifications@github.comwrote:
Nope, unrelated. These are duplicate messages queued by MH to be sent out.
No worries, those will still be ignored. |
That is somewhat benign, but I think it means that I am failing to clear a message somewhere. Let me see if I can find the bug. |
Hmm, so I can't reproduce the |
I'll let it run overnight and see if it appears again. |
On 03/27/2014 07:03 PM, Kevin Robert Keegan wrote:
Ah okay, I thought so, but thanks for confirming. Cheers, Eloy Paris.- |
I looked through the logs this morning (the overnight test ran from 00:54:57-11:43:00) and found ten occurrences of this warning:
The event at 00:55:03 was induced by starting the test right before the end of the minute which caused my user code to fire before MH was done initializing all the Insteon devices as was seen before. Here is an excerpt of the log around the event at 01:21:01:
|
BIIIIIG Bug this was causing the PLM device id to be identified wrong, leading to bad links and yadda yadda.
I just patched a large bug in the parser, please update if you are using this branch. Otherwise any links you create will be wrong. |
- Add routine to request and process PLM Config Settings - Add routine to set/unset monitor mode - Fix Typo in Command Length - Fix message parsing of broadcast messages. The message is not that long. Not sure what is_master is supposed to be, seems like it can only be cmd1, dev_attributes makes sense.
After running the new code for a few days I had the iMeter go deaf on me after the read at 11:04:
Restarting MH seems to clear it up and the iMeter starts responding again. |
In that particular instance, your PLM simply stopped responding. Based on the logs, I really can't say why, nor do I know why a restart would fix it. |
Thanks @krkeegan, if it happens again I will try sending some commands to my other Insteon device before restarting to see if it behaves any differently. |
The same symptoms re-appeared earlier this morning and after noticing this time I tried sending commands to another Insteon device which behaved in the same way as the iMeter in that I see the commands being sent, but they simply time out waiting for an ack. |
Hmm, the fact that you can rectify this by restarting MH suggests that the serial connection is being broken somehow. I don't know what type of OS you are on, but are there any logs that you can review to see what might be happening? Interestingly, this is somewhat similar to an issue raised on the mailling list by Jeff Siddall with the subject "Automatically reconnect after USB Insteon PLM disconnected?" I don't know very much about the perl interface to serial ports or how MH implements it. But your problem and Jeff's would likely be solved if we could somehow figure out a way to check the status of a serial port and if it is "down" reinitialize it. |
Hi guys, A restart might fix the problem because that might cause some of the For instance, I have an Arduino-like device that connects to my server Cheers, Eloy Paris.- On 04/04/2014 02:50 PM, Kevin Robert Keegan wrote:
|
My MH test box is running CentOS 6.3 (kernel 2.6.32-279.19.1.el6.x86_64) at the moment. After digging through the logs, I do indeed see the USB PLM disconnecting and reconnecting.The dates seem to match, but the time of the disconnect/reconnect seems to be a few hours later than when MH stops getting acks from the Insteon devices. |
I looked a bit more into the time discrepancy between events, and it would appear to be related to my server's log time and system time being offset by a few hours. Removing the time offset seems to to correlate with the disconnect/reconnect events and MH loosing connectivity. |
This appears to be good, with the exception of the power cycling problems, which are not really an issue with this commit. I have created #397 for discussion of that particular issue. |
Completely Rewritten Insteon Message Parser
This code changes the PLM parser to a First-In-First-Out type of parser. This should resolve the issue found in #374 where shorter messages were being excerpted out of the middle of larger messages.
This path fixes a number of notable problems with the prior code:
@rudybrian, since you seem to be most able to reproduce a parser error, can you test this out and see if this solved that issue?