-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: notification for disappearing remote harvester #83
Conversation
Loving that you managed to extend chiadog to your particular use-case! I'll have a more detailed look on the weekend. For merging, we'll need to consider how generalisable this would be to more chiadog users compared to the overhead in the maintenance required. Seems it has the potential to increase support requests significantly due to the sheer quantity of possible combination of setups this makes possible 😄 |
No worries. I added this primarily because we has several unnoticed outages last week due to windows force-rebooting 😄 But I would be glad to share this if found useful to the project. |
this is awesome, thank you so much ! the chia offical app does absolutely nothing to help diagnose or even alert us of these problems. will pop it on my farmer as well and give it a go ! 🥇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a useful feature but I think it'll need a bit more work if we want to merge it. Perhaps also configuration options to enable/disable. But I think this will work transparently also with local harvesters so maybe not necessary.
src/chia_log/handlers/condition_checkers/remote_harvester_activity.py
Outdated
Show resolved
Hide resolved
Storing remote harvesters in dict Store ip_addr again if changed Delete deactivated code Change test ip addresses to more dummy looking ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100.100.100.100 is part of the 100.64.0.0/10 block which is specifically for Carrier Grade NAT to ensure no clobbering with private IP space. As per the RFC:
This address block should not be used on private networks or on the public Internet
To prevent problems for users, it's probably best to include only examples from the designated private IP ranges:
24-bit block | 10.0.0.0 – 10.255.255.255 | single class A network |
---|---|---|
20-bit block | 172.16.0.0 – 172.31.255.255 | 16 contiguous class B networks |
16-bit block | 192.168.0.0 – 192.168.255.255 | 256 contiguous class C networks |
More info:
https://en.wikipedia.org/wiki/IPv4_shared_address_space
https://en.wikipedia.org/wiki/Private_network
@fiveangle Thanks for taking a look! The IP adresses in the test logs are just used in the unit tests for representing a dummy remote harvesters and have no negative impact on the user. |
@cbacon93 Thanks! This will replace my custom notifications. I'm going to try to run this as well. Do you have any tips for how to determine which harvester "won" to help determine splits for private pools? |
@amirdt22 thanks, let me know if you find any issues! |
First of all thanks for your contibution. The feature seems promising. |
Working great so far here, Also getting alerts after 300+ seconds of no activity. |
@cbacon93 - would you mind terribly rebasing onto 0.5.0 ? I've ran into problems on my rebase and it isn't clear where the problem is, but i can't seem to get it to work now :) (maybe squash all your updates into a single commit would resolve ?) Loving the feature btw ! ❤️ |
Storing remote harvesters in dict Store ip_addr again if changed Delete deactivated code Change test ip addresses to more dummy looking ones
@fiveangle Thanks, I just rebased it to 0.5.0. It worked but it was strange because I was not able to push afterwards. I should have used merge 😄 |
Here are the differences: main...cbacon93:feature-65 |
@cbacon93 - i usually just squash everything before a rebase on a PR only to keep things tidy and easy to review for the origin maintainer, and also because so often git gets confused with so many intermediary commits that change things in the same areas 😄 So i fired this up from b5b2800 and now I'm getting the false positive "harvester offline" notices that don't stop until restarting chiadog, then resume not long after. Have you noticed any issues on your side ? Granted this is on Windows, which has been problematic, but was working great at 3893bca which i was running for days w/o issue immediately before this. UPDATE: I reverted back to vanilla 0.5.0 and the false-positive harvester offline issue does not occur. Something between 0.5.0 and b5b2800 is triggering the problem. It seems similar to the log rotation issue that was pretty much resolved with 3893bca. Maybe look at that to see if how it was handled might be affecting your feature at all ? Please let me know how I can help test further or provide any additional info ! |
@fiveangle thats weird, thanks for the info! What Harvester/Farmer setup do you have and do you get any python errors in the chiadog log? |
I seem to have the same issue as @fiveangle and I do not see any Python errors in chiadog's log. Mac OS X Big Sur 11.2.3, built off current main/master. |
Unit tests seem to pass. Yes, as I mentioned in my post above, commit 3893bca appeared to have fully resolved #72 which is why I mentioned the fix for you to review to see if your feature falls outside of the methods in that commit to fix the log rotation. However, reviewing just now, it seems #73 may not have been fully fixed by 3893bca ? I can say that it appears to have fixed for me, but even if it rears its ugly head, the fact remains that with your PR, the very first log rotation fails every time with the false-positive harvester offline msg. With 3893bca (in 0.5.0) it at least works far more often than it doesn't (for me it hasn't skipped a beat but I guess for others it has). You can see in the screenshot below, within 20min of starting with your PR, the log rotates, and the false-positive starts. I then stopped chiadog, loaded 0.5.0 at it ran all night w/o a false-positive harvester error: |
@cbacon93 do you already have insights as to how much additional overhead on disk IO the DEBUG log level causes? Particularly interesting for devices such as the Raspberry. |
@@ -21,14 +21,14 @@ class FarmerServerMessage: | |||
class FarmerServerParser: | |||
"""This class can parse info log messages from the chia farmer_server | |||
|
|||
You need to have enabled "log_level: INFO" in your chia config.yaml | |||
You need to have enabled "log_level: DEBUG" in your chia config.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ? is this actually required ? this will generate tons more IO on the boot drive which is already too much at INFO level. for plotters booting from USB (like all mine) this will likely cripple them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is required because of a change in log levels introduced in chia 1.1.6
https://github.com/Chia-Network/chia-blockchain/blob/1c808b6c2910ed32fdbfdfc576ba1bc5a5adeac9/chia/server/server.py#L492
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've been running with the chia source reverted so these farmer logs from peers still go to INFO level because while setting at DEBUG, logs were pushing over 9GB/day on my system (~2300 plots, 5x remote harvesters). At INFO 2.4GB. I don't have a good answer, but I did make a comment in the source. I'm sure no one at ChiaCo will care, but what else can we do ? 🤷 Based on these changes, and the fact that ChiaCo does nothing but complain that we should use remote harvesters, then actively makes it harder/impossible to confirm remote harvesters are working and providing zero tools to do so, I can't say it gives me much confidence in the decisions being made higher up. Back to chiadog, can we not merge it anyway ? I think the right solution is to add more detail on the instructions to say "be forewarned changing logging level to DEBUG will cause excessive logging, which may impact farmers with weak CPU, logging to SD or USB flash devices, etc", but also perhaps give a link to advanced detail on how the user can revert the chia-blockchain source to demote these messages back to INFO like I have ? (I suggest a link only to not muddy the instructions with complicated instructions that the average users won't need). As it is now, not having this feature makes chiadog not really useful for anyone using remote harvesters unless they deal with managing chiadog on every remote harvester too (quite a pain by comparison). Plus having it wouldn't affect anyone who isn't using remote harvesters. Thanks again for trying to make this work @mahaupt ! |
You can now get remote harvester infos by rpc calls. Maybe this will make our problems obsolete 😅 |
Yes, I was also going to suggest the RPC route, log parsing is not the best long-term strategy, particularly for this use-case. Perhaps there's already chiadog alternative using that approach? I know @pieterhelsen was also entertaining the idea of pluggable architecture which could allow writing plugins (e.g. RPC monitoring) for chiadog in a more decentralised way. |
chiadog alternative ❓ |
I added the feature #65
Let me know what you think.
I'll upload this to my farmer for long term testing
It parses the log for farmer_server entries that indicate incoming data from harvesters.
It saves the last timestamp, the peer hash and the ip address in a list. If one harvester doesnt send any data for more than 300s, it sends a notification and removes the harvester from the list.