Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small Improvements to Changelog Verification #301

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

ExtremeFiretop
Copy link
Owner

@ExtremeFiretop ExtremeFiretop commented Sep 7, 2024

Description of the Bug/Issue:

When doing an offline firmware update this morning on my ASUS GT-BE98 Pro to the newest alpha build version 3006.102.2_alpha1 using MerlinAU, I noticed that it failed to find the correct changelog file even though the changelog file is present and valid in the directories as found below:

image

The source of the issue is this line in the changelog verification function:

release_version="$(Get_Custom_Setting "FW_New_Update_Notification_Vers")"

This line is what MerlinAU uses to determine which changelog to look for.
As seen by the above screenshot it's looking for the NG file instead of the 3006 file on my 3006 router.

Possible Resolutions:

1. Different logic for offline changelog selection than production.
(Based on installed firmware version instead of update firmware version)

  • My thoughts:
  • I dislike this idea., because some routers may some day move from the NG changelog to the 3006 changelog and the current production logic allows for this to work.

2. I could simply update the entry in the file, or disable the changelog verification.

  • My Thoughts:
  • I dislike this idea, I shouldn't ever need to update that settings file manually outside of testing, if I'm a new user, that entry will be TBD as seen above, now I highly doubt a new user would ever use the offline function as their first move, but we should still correctly account for this situation, as well as I shouldn't need to disable changelog verification to allow the flash to happen. Especially if a changelog is present and valid.

3. Update the offline logic to update the production file entries if it's used to do a firmware update. This would allow the script to find the correct changelog file without impacting production logic.

  • My thoughts:
  • I like this idea the best, so I've opted to update the file entries if a user triggers an offline update.

With the noticed solution implemented, re-running gave the expected results as seen below:
image

@ExtremeFiretop ExtremeFiretop marked this pull request as ready for review September 7, 2024 17:06
@ExtremeFiretop ExtremeFiretop added the bug Something isn't working label Sep 7, 2024
@ExtremeFiretop
Copy link
Owner Author

ExtremeFiretop commented Sep 7, 2024

I decided to re-implement the post-flash status email for the offline feature.
We removed it as apart of the fix for PR: #283

At the time, I considered exporting the variable, but it opted to just remove the post-flash email since it was easier:

I'll have to adjust the logic, in testing I realize that the email won't have access to that variable sadly.

I'll have to export it? Or just not send the email. What do you think? @Martinski4GitHub

But considering now I'm exporting it anyways for the changelog above, I may as well re-implement the post-flash email.

@Martinski4GitHub
Copy link
Collaborator

...

The source of the issue is this line in the changelog verification function:

release_version="$(Get_Custom_Setting "FW_New_Update_Notification_Vers")"

This line is what MerlinAU uses to determine which changelog to look for. As seen by the above screenshot it's looking for the NG file instead of the 3006 file on my 3006 router.

...

3. Update the offline logic to update the production file entries if it's used to do a firmware update. This would allow the script to find the correct changelog file without impacting production logic.

* **My thoughts:**

* I like this idea the best, so I've opted to update the file entries if a user triggers an offline update.

Yeah, when doing an offline update with Alpha/Beta versions, there is no notification from the router itself (i.e. executing the webs_update.sh script) so the corresponding NVRAM vars are not yet set and, therefore, the config entries may be unset as well. Setting those config entries during the offline update looks like a good idea to work around the problem without affecting F/W updates for production releases (as you said).

@Martinski4GitHub
Copy link
Collaborator

I decided to re-implement the post-flash status email for the offline feature. We removed it as apart of the fix for PR: #283

At the time, I considered exporting the variable, but it opted to just remove the post-email flash since it was easier:

I'll have to adjust the logic, in testing I realize that the email won't have access to that variable sadly.
I'll have to export it? Or just not send the email. What do you think? @Martinski4GitHub

But considering now I'm exporting it anyways for the changelog above, I may as well re-implement the post-flash email.

OK, understood. The post-reboot email notification serves as a confirmation when the F/W update has been completed successfully.

Copy link
Collaborator

@Martinski4GitHub Martinski4GitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go..

@Martinski4GitHub Martinski4GitHub merged commit f192ceb into dev Sep 9, 2024
1 check passed
@Martinski4GitHub
Copy link
Collaborator

@ExtremeFiretop,

BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?

killall: afpd: no process killed
killall: cnid_metad: no process killed

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

@ExtremeFiretop
Copy link
Owner Author

ExtremeFiretop commented Sep 9, 2024

@ExtremeFiretop,

BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?

killall: afpd: no process killed
killall: cnid_metad: no process killed

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.

Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...

Not exactly sure if it's a bad thing, but I've noticed it as well.

It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

@ExtremeFiretop ExtremeFiretop deleted the ExtremeFiretop-Small1.3.1Patch branch September 9, 2024 03:54
@Martinski4GitHub
Copy link
Collaborator

@ExtremeFiretop,
BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?

killall: afpd: no process killed
killall: cnid_metad: no process killed

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.

Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...

Not exactly sure if it's a bad thing, but I've noticed it as well.

It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:

/sbin/ejusb -1 0 -u 1 >/dev/null

The above call should make it "silent" unless there's an actual error.
Or, we could try to make it completely silent:

/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

@ExtremeFiretop
Copy link
Owner Author

@ExtremeFiretop,
BTW, do you know what triggers the following messages that sometimes are shown right before the F/W update is about to start?

killall: afpd: no process killed
killall: cnid_metad: no process killed

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.
Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...
Not exactly sure if it's a bad thing, but I've noticed it as well.
It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:

/sbin/ejusb -1 0 -u 1 >/dev/null

The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:

/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.

It's just much more chatty about what it's doing now. (Or not doing..)

I'm fine with that suggestion of suppression!

@Martinski4GitHub
Copy link
Collaborator

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.
Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...
Not exactly sure if it's a bad thing, but I've noticed it as well.
It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:

/sbin/ejusb -1 0 -u 1 >/dev/null

The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:

/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.

It's just much more chatty about what it's doing now. (Or not doing..)

I'm fine with that suggestion of suppression!

I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

@ExtremeFiretop
Copy link
Owner Author

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.
Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...
Not exactly sure if it's a bad thing, but I've noticed it as well.
It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:

/sbin/ejusb -1 0 -u 1 >/dev/null

The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:

/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.
It's just much more chatty about what it's doing now. (Or not doing..)
I'm fine with that suggestion of suppression!

I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

This is the most likely explanation.
It makes me think of the weird behavior we saw with a Gnuton test router once...

It's been so long now that I don't remember the specifics, I'd need to go back through the Gnuton PR to see if I can spot the error again, but it felt like something was holding onto the USB still when we were trying to eject it.

It was then around the first time I suggested maybe going back to an inhouse solution. I'll see if I can dig up more info again

@Martinski4GitHub
Copy link
Collaborator

I believe they happen during the call to the /sbin/ejusb command, and I wonder why they occur sometimes but not always. Just wondering...

No idea, but I've noticed a number of odd things with the ejectusb command in recent firmwares. It's been much more "vocal" than previous versions.
Before it never showed anything, then it showed a message if there was no USB, now it's showing kill commands, etc...
Not exactly sure if it's a bad thing, but I've noticed it as well.
It still makes me think it may be time to revisit an inhouse solution. I know i must sound like a hypocrite because I fought to use this initially, but it's behavior has changed over time...

I'd suggest to try a simpler solution first:

/sbin/ejusb -1 0 -u 1 >/dev/null

The above call should make it "silent" unless there's an actual error. Or, we could try to make it completely silent:

/sbin/ejusb -1 0 -u 1 >/dev/null 2>&1

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.
It's just much more chatty about what it's doing now. (Or not doing..)
I'm fine with that suggestion of suppression!

I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

This is the most likely explanation. It makes me think of the weird behavior we saw with a Gnuton test router once...

It's been so long now that I don't remember the specifics, I'd need to go back through the Gnuton PR to see if I can spot the error again, but it felt like something was holding onto the USB still when we were trying to eject it.

It was then around the first time I suggested maybe going back to an inhouse solution. I'll see if I can dig up more info again

Yeah, it would be good to have more information as to when the msgs are triggered. It's not really an issue at all (so far) since the command has been working for our purposes, but I'm trying to understand what's happening to avoid it (if possible).

@Martinski4GitHub
Copy link
Collaborator

As you said, those messages don't seem to be a "bad" thing. Other than that, the command seems to be working, AFAICT, and it's not affecting the actual F/W update process.

As of now, I haven't noticed any changes in its actual behavior in the intended goal of ejecting the actual USB from the router.
It's just much more chatty about what it's doing now. (Or not doing..)
I'm fine with that suggestion of suppression!

I'm wondering if those messages are triggered because there is still some running process that's accessing the USB drive right at the time when our script is trying to eject it, so it cannot unmount the filesystem. I'll sleep on this tonight :>)...

This is the most likely explanation. It makes me think of the weird behavior we saw with a Gnuton test router once...

It's been so long now that I don't remember the specifics, I'd need to go back through the Gnuton PR to see if I can spot the error again, but it felt like something was holding onto the USB still when we were trying to eject it.

Hey, bud. Apologies for the "radio silence." During my first week after coming back from vacation, I've been extremely busy with work, family & house stuff. In any case, I was thinking about the messages coming from the "/sbin/ejusb" command that were possibly triggered by a process accessing the USB-attached drive filesystem right at the moment when our script attempts to eject the drive, and one solution that could minimize the chances of such occurrence would be to remove all existing CRON jobs from 3rd-party add-ons immediately after stopping all Entware services, but before ejecting the USB drive. I've already written a function to do that and have been testing it this evening to make sure it works as intended. The idea is to stop processes from starting to execute around the time we're trying to eject the drive and initiate the actual F/W update process.

What do you think of such an approach?

@ExtremeFiretop
Copy link
Owner Author

Hey, bud. Apologies for the "radio silence." During my first week after coming back from vacation, I've been extremely busy with work, family & house stuff.

No worries, I've been busy as well :)

In any case, I was thinking about the messages coming from the "/sbin/ejusb" command that were possibly triggered by a process accessing the USB-attached drive filesystem right at the moment when our script attempts to eject the drive, and one solution that could minimize the chances of such occurrence would be to remove all existing CRON jobs from 3rd-party add-ons immediately after stopping all Entware services, but before ejecting the USB drive. I've already written a function to do that and have been testing it this evening to make sure it works as intended. The idea is to stop processes from starting to execute around the time we're trying to eject the drive and initiate the actual F/W update process.

What do you think of such an approach?

Is your idea to export the cron jobs, remove them, and reimport them afterwards? Or just confirm no cron jobs are executing around the time? I'm not sure how I feel about that approach, I've need to see what you mean to better judge the solution.

BTW: I know you had some contact with Jack in the past, any ideas when the dev versions will become main versions?
Becoming tired of seeing this every weekend hahaha:

image

@Martinski4GitHub
Copy link
Collaborator

Martinski4GitHub commented Sep 15, 2024

Hey, bud. Apologies for the "radio silence." During my first week after coming back from vacation, I've been extremely busy with work, family & house stuff.

No worries, I've been busy as well :)

In any case, I was thinking about the messages coming from the "/sbin/ejusb" command that were possibly triggered by a process accessing the USB-attached drive filesystem right at the moment when our script attempts to eject the drive, and one solution that could minimize the chances of such occurrence would be to remove all existing CRON jobs from 3rd-party add-ons immediately after stopping all Entware services, but before ejecting the USB drive. I've already written a function to do that and have been testing it this evening to make sure it works as intended. The idea is to stop processes from starting to execute around the time we're trying to eject the drive and initiate the actual F/W update process.
What do you think of such an approach?

Is your idea to export the cron jobs, remove them, and reimport them afterwards? Or just confirm no cron jobs are executing around the time? I'm not sure how I feel about that approach, I've need to see what you mean to better judge the solution.

The idea is to remove all cron jobs created by 3rd-party addons from crontab a few seconds before the USB drive is ejected & the actual F/W update process begins. At that point, we want to eliminate the possibility of a cron job starting to execute and possibly accessing th USB drive filesystem when were about to eject the drive and begin the F/W update. The cron jobs will get restore after the router reboots so there's no need to do anything else (no "re-import").

I will submit a PR later in the evening so you can see what I mean and can test how it works. Right now, my wife & I are about to go out to do some shopping (groceries for the week and other stuff) and then will stop by my parents' home. I'll be offline until later today in the evening.

BTW: I know you had some contact with Jack in the past, any ideas when the dev versions will become main versions? Becoming tired of seeing this every weekend hahaha:

Jack Yaz has gone radio silent for almost a month now. Maybe he's on vacation (it's common in Europe to get a month vacation around this time). I'll ping him in a few days to check about the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants