-
-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG][UNFIMP] Incorrect MAC imported and causing app restarts #848
Comments
Hi @nathang21 , I see the app restarting when new devices are created. 22:29:34 [Update Devices] - (if not empty) cur_SSID -> (if empty) dev_SSID
22:29:34 [Update Devices] - (if not empty) cur_Type -> (if empty) dev_DeviceType
22:29:34 [Update Devices] - (if not empty) cur_Name -> (if empty) dev_NAME
22:29:37 [MAIN] Setting up ...
22:29:37 [conf.tz] Setting up ...
22:29:37
22:29:37 The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.
22:29:37
22:29:37 Permissions check (All should be True)
22:29:37 ------------------------------------------------
22:29:37 /config/app.conf | READ | True
22:29:37 /config/app.conf | WRITE | True
22:29:37 /db/app.db | READ | True
22:29:37 /db/app.db | WRITE | True
22:29:37 ------------------------------------------------ Can you try to surface the exception which can't be captured in logs by following this guide: https://github.com/jokob-sk/NetAlertX/blob/main/docs/DEBUG_TIPS.md#2-surfacing-errors-when-container-restarts- Start the container via the terminal with a command similar to this one: docker run --rm --network=host \
-v local/path/netalertx/config:/app/config \
-v local/path/netalertx/db:/app/db \
-e TZ=Europe/Berlin \
-e PORT=20211 \
jokobsk/netalertx:latest
Or check the docker or Portainer logs? You should be able to see an exception before the container restarts. Thanks in advance, |
Hey @jokob-sk thanks for the quick response, i've tried all of those debugging steps previously, and as I mentioned above is the weird thing is that container remains healthy as far as I can tell, just the backend restarts within the container, but the container continues to run for days without stopping. Just in case I will start the container without -d again to see if it ever crashes, but in my experience that isn't what happens. |
Hi @nathang21 , Just FYI the backend restarting and the container restarting are 2 different things. The container doesn't restart (become unhealthy) if the app backend restarts as the reboot is also used when initializing new settings. The Portainer and Docker logs will still contain the exception. If the restart isn't occurring right now, you can probably replicate it by deleting all devices and waiting for the app to try to re-add the devices. Please backup everything at first and download the devices.csv file (and verify it) as a backup. So please try to have a look in the e.g. Portainer logs, search for:
...and scroll up a few lines where most likely you'll be able to find a logged exception. I think the restart might be caused by a device name or other field that contains some un-escaped character. Thanks in advance, EDIT: Backup guide: https://github.com/jokob-sk/NetAlertX/blob/main/docs/BACKUPS.md |
Thanks, I understand the difference which is why I specified it explicitly. To be very clear, the container is NOT crashing, and there is no exception. The container has been running for many days (until I restarted it earlier today) but the backend restarts a few times per hour. Container still hasn't crashed after 5 hours (the backend has restarted numerous times), but will let it run over night and see just in case. If not I will move forward with the backup and try to force an occurrence as you suggested. In the meantime, here is a snippet of the latest logs from my terminal for reference if it's helpful.
|
Hi @nathang21 , Thanks for the info. Well it's good it's working at least. Looking at the logs I can't see anything wrong. Let's see if the issue reappears of you can reproduce it later on. Keep me posted, |
Good morning, here is an updated log output from my terminal, there is a Traceback right before the backend restarted in it's most recent occurrence. Is this helpful by chance? Other details/symptoms that may be relevant:
Do let me know if you need more details, full logs, or additional troubleshooting steps. Thanks again. |
Thanks @nathang21 , this helps a lot. In order to determine the root cause of the issue, can you please send me the logs for the DEVICES and CurrentScan table just before this issue occurs? It seems like a plugin is passing an invalid MAC address to the core app.
Thanks in advance, |
I also added a bit of additional logging so if you can switch to the
Still, I will need the above printed output of the database tables to properly fix the problem. Thanks in advance, |
Thanks for the instructions and patience, I just sent you over an email with the logs. The backend restarted within a few minutes after a fresh creation + boot of the container so I just sent over the full app.log, but if that's too much I can trim it. Best, |
Hi @nathang21 , Thanks a lot for the help! This is exactly what I needed. UNFIMP seems to import invalid MAC addresses for some devices. I implemented a check and only valid MAC addresses are stored and passed to the app now. This should be fixed in the next release. It would be great if you could test this. Can you please switch to the Thanks in advance, |
Glad to hear it, I think there may be multiple sources of crashes unfortunately, as I only recently got the UNFIMP plugin setup, but I've been having this issue for longer. However, the latest version does appear more stable, it lasted for about 30 minutes before crashing this time, i've emailed over an updated app.log and app.conf right after the crash, but i'm not seeing an obvious stack trace this time. Best regards, |
Hi @nathang21 , thanks for the logs again! I checked them and the issue seems the same. Are you sure the new image is used? I checked the log and a new debug output that should be logged
NetAlertX/front/plugins/plugin_helper.py Line 88 in 05e4de0
... isn't found in the log file. Could you please double check the latest
Thanks in advance! |
Shoot sorry about that, I added an explicit Got another occurrence and shared logs with you directly. Here is the stack trace:
|
Hi @nathang21 , Are you sure you pulled the newest
I also deployed this #856 code to the dev image - can you verify you see this when you select to display the Last IP in the device list? If you don't please try to pull the dev image again until you see the Last IP as a link - this will then verify you are on the latest dev image. |
Also make sure you are pulling the |
Hi @nathang21 , Thanks a lot. The fix should prevent the
What I think is now happening is, the system has now already ingested an invalid MAC address so we need to remove invalid devices before the fix takes effect. Can you delete the devices or setup a new instance to test this? You can try any of the following depending on if you want to preserve existing data:
What I expect to see after a clean up of the DB in the logs:
The value Thanks for the help and patience, |
Thanks for the explanation, that makes sense. I ended up going with #3 and good news, I think it's been stable for at least 24 hours now. 🤞 Following up on 2 potentially unrelated issues (happy to open separate ones for those if preferred).
Version: Built on: 2024-10-24 | Version: 08:01:49 - Dev |
Hi @nathang21 , Glad to hear that! 🎉 It really is easier to open separate issues so I can then track them for the release announcement and I have relevant logs. 1For the 1. issue, can you please clear both caches (in-browser by clicking shift + refresh on the tab) and the app one (the blue 🔄 button in the app header). If the issue persists, please open a new issue with the browser console log (F12). I also might need the output of relevant plugins (Search for 2I think that's currently by design but I would have to look into that. Let's fix the above issue first and see if the behavior persists (might be related) and if yes, you can open a new issue and I can look into this as well. This might be a more substantial pice of work as there are back end dependencies I have to look at. Again, easier tracked in a separate issue. Also, just FYI the latest logs you sent to the email seem to be empty (file size 0). Thanks for the patience, |
Thanks, and sorry about the empty logs, forgot to fix file permissions before downloading from my NAS. All in all it's been stable for awhile now so I think we can close this issue. I'll open separate issues for the other 2 when I get a chance. Thanks again. |
Thanks for the update @nathang21 |
Is there an existing issue for this?
Current Behavior
App is frequently unresponsive, but the container remains healthy. It appears from the log the backend restarts frequently, from my experience it happens at least a couple times per hour.
One thing to add, this happened earlier on when I was setting up netalertx, and I scrapped the entire config/db because I couldn't figure out the issue, and it was stable for some period of time (I didn't monitor closely). Now that it has reoccured, i'm opening an issue to report since I've failed to debug on my own.
Expected Behavior
The backend to remain stable or at least the logs to clearly indicate why it is unstable.
Steps To Reproduce
This happens continuously, even after restarting/recreating the docker container, perhaps there is an invalid config or a corrupted DB but i'm not sure.
app.conf
docker-compose.yml
What branch are you running?
Production
app.log
app.log was too big to upload here, hosted here instead.
Debug enabled
The text was updated successfully, but these errors were encountered: