-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Corruption Issue with latest release #854
Comments
https://github.com/nRF24/RF24/blob/master/RF24.cpp#L361 Needs to be some line breaks here, think I found the problem |
Did adding those blank lines fix it? I think that change was part of #846 (see description). To be safe, we should probably use curly brackets for all loops and conditions. The referenced commit doesn't conform to clang-format settings (Linux CI failed). |
Yup, otherwise its like teh following without the line breaks: while (condition){ while (condition) {} } |
ok, It looks like you committed a trailing whitespace on the blank lines, which is causing the clang-format to fail. There is another place where I think a blank line would be needed Lines 344 to 346 in d7ba9c2
which affects Linux; you're commit affects Arduino only. |
fixed trailing whitespace and added a blank line in 3f786bf |
damn. I was kinda glad to was that easy. I should've known better. |
I think the Arduino lib manager hasn't published v1.4.4 yet, so we might still have 24 hours. |
Ok I can do new release later in the day unless you get a chance first
… On Jul 18, 2022, at 10:02 AM, Brendan ***@***.***> wrote:
I think the Arduino lib manager hasn't published v1.4.4 yet, so we might still have 24 hours.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you modified the open/close state.
|
I can go through and make sure all single statement loops and conditions use curly brackets (just as a first pass). I'm not at all familiar with setting up RF24Ethernet, so I may have trouble reproducing this on my end... I'll be focusing on this diff (specifically the RF24.cpp file): |
ok I submitted a patch on the master...finding-data-corruption I also went throught the release diff v1.4.2...v1.4.3 for the RF24_config.h and RF24.h (after going through RF24.cpp), and all I could see is whitespace differences (mostly related to indentation of C syntax like |
Nice work! Btw you should be able to replicate the issue by sending fragmented payloads I think just with RF24Network or RF24Mesh
… On Jul 18, 2022, at 10:46 AM, Brendan ***@***.***> wrote:
ok I submitted a patch on the finding-data-corruption branch which adds curly brackets to all single line loops. I didn't find any single line condition statements.
master...finding-data-corruption
I also went throught the release diff v1.4.2...v1.4.3 for the RF24_config.h and RF24.h (after going through RF24.cpp), and all I could see is whitespace differences (mostly related to indentation of C syntax like #if ... #else ... #endif)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you modified the open/close state.
|
I'm going to take a look at the diff for nRF24/RF24Mesh@v1.1.6...v1.1.8 because I think there was some single line loops/conditions in that lib also. You may have actually solved the corruption on this level, so I'm keeping an open mind about the other layers also. |
Coincidentally, my local clone of RF24Mesh was somehow corrupted; git was saying "claims to have 125 objects while index indicates 188 objects" - whatever that means... I deleted then re-cloned RF24Mesh and git seems happy again - I doubt the git corruption made it into the remote because I wasn't able to commit anything. After re-cloning, I ran
I'm guessing |
Nonetheless, I submitted a patch to nRF24/RF24Mesh@master...finding-data-corruption It was mostly single line condition statements that didn't have curly brackets. |
nRF24/RF24Network@v1.0.16...v1.0.17 just has whitespace changes; there's no single line loops or conditions that don't already use curly brackets. I'm going to start testing... I'm assuming you experienced this on a MCU/Arduino, but I'll run some fragged msg tests on both Linux and Arduino. |
Just noticed this in the build log for the RF242Mesh examples:
I can address it on the RF24Mesh finding-data-corruption branch. |
I suppose we can use clang-tidy to notify us of these types of warnings... |
Sure, may as well clean it up while we're in the process of fixing things |
Turns out |
Well, I haven't tested mesh yet, but the network layer seems to work well with fragged msgs. For the record I'm using pyrf24 on 1 end because it is still pinned to the libs prior to pigpio changes (the benefit of using submodules). On the other end I'm using my trusty QtPy M0 (ATSAMD21) with latest from RF24 finding-data-corruption branch and latest release of RF24Network. ps - I really like the utility from the pyrf24/examples/general_network_test.py. Its really well suited for this. I wouldn't mind having a C++ variant of it in the nRF24/.github repo (because it isn't specific to only net or only mesh layers). |
I'm confident that's all it was, just the line breaks needed between the loops with no brackets. I think we should try to get a release out this evening. |
I'm onboard with that, at least for RF24. I still need to test the changes in RF24Mesh before we issue a new release for it. |
I'll also change clang-format to not allow short loops and conditions. It will automatically change them to use curly brackets and multiple lines... We at www.github.com/cpp-linter org are developing a way to use a pre-commit action that will automatically commit changes made by clang-format (and clang-tidy if we ever started to use that). |
Hmm, testing some more shows there are still issues. I could have sworn it was working this morning :( |
Maybe I should increase the frequency of the fragged TX. I had it delayed 3 seconds to allow the payload to be printed to Serial. When I ran it with 2 sec delay between TX, some fragged payloads weren't getting received as a single msg... not sure if I worded that right. |
I'm kind of stumped. Going through the diff between 1.4.2 and 1.4.3 I don't see what would be causing it, unless there is a memory issue or similar higher up on the stack that changes are exposing? |
The only notable changes I see to the mesh layer is |
Well I just found something interesting... I tried downgrading from 1.4.4 to 1.4.2 manually, file by file. Replacing RF24.cpp and RF24.h made no change, but replacing the RF24_config.h did.
Then it works fine... Does this make any sense? |
It might. I almost forgot about that addition. Now I have to ask, what board are you using? |
AVRs, nano, pro mini and duemilanove |
Geez, I haven't wired up one of those in a while now. I'm looking at the AVR core to see if it should be tagging @dstroy0 |
But why is it causing data corruption? That doesn't really seem to make sense to me unless its exposing an issue somewhere else. |
Did the sketch actually have code that calls the function? |
Ahh, the sketch does call sprintf_P |
The config code in question was specifically added for Linux because it doesn't use a pgmspace.h to define the sprintf_P alias. We could probably do better to make sure it is only defined for Linux. |
ok, I moved the define to just below where utility/includes.h is pulled in. Turns out it was also used for the PicoSDK... |
I also moved the include for stdio out of RF24.cpp and into utility/RP2/arch_config.h; all Linux drivers were already pulling in stdio (probably for printf). This should almost conclude revisions about #821 . I've been thinking that we might also want to |
We got it working! Lets maybe try to keep the changes minimal. |
I forgot to bump the version in the properties files before release. I was able to fix it and re-release., but wow what a pain. |
Something went wrong in the last release of RF24. I started seeing data corruption issues on my RF24Ethernet nodes, and ended up downgrading all the libraries until I got to RF24. Downgrading to previous release is what fixed the issues.
Messed up:
After Downgrading
So I assume this issue will affect all higher layer libraries, kind of a crappy issue. I'm not sure where to start looking, and I don't have too much time today to look for the problem, so we'll see how long it takes to figure out I guess.
Wondering if the clang formatting messed up some logic or something?
I would suggest downgrading to 1.4.2 and waiting until we have a fix for any users that encounter the same or similar issues.
The text was updated successfully, but these errors were encountered: