Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: notification for disappearing remote harvester #83

Closed
wants to merge 15 commits into from
Closed

Feature: notification for disappearing remote harvester #83

wants to merge 15 commits into from

Conversation

mahaupt
Copy link

@mahaupt mahaupt commented May 5, 2021

I added the feature #65
Let me know what you think.
I'll upload this to my farmer for long term testing

It parses the log for farmer_server entries that indicate incoming data from harvesters.
It saves the last timestamp, the peer hash and the ip address in a list. If one harvester doesnt send any data for more than 300s, it sends a notification and removes the harvester from the list.

@martomi
Copy link
Owner

martomi commented May 5, 2021

Loving that you managed to extend chiadog to your particular use-case!

I'll have a more detailed look on the weekend. For merging, we'll need to consider how generalisable this would be to more chiadog users compared to the overhead in the maintenance required. Seems it has the potential to increase support requests significantly due to the sheer quantity of possible combination of setups this makes possible 😄

@mahaupt
Copy link
Author

mahaupt commented May 5, 2021

No worries. I added this primarily because we has several unnoticed outages last week due to windows force-rebooting 😄 But I would be glad to share this if found useful to the project.

@fiveangle
Copy link
Contributor

this is awesome, thank you so much ! the chia offical app does absolutely nothing to help diagnose or even alert us of these problems. will pop it on my farmer as well and give it a go ! 🥇

Copy link
Owner

@martomi martomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a useful feature but I think it'll need a bit more work if we want to merge it. Perhaps also configuration options to enable/disable. But I think this will work transparently also with local harvesters so maybe not necessary.

Storing remote harvesters in dict
Store ip_addr again if changed
Delete deactivated code
Change test ip addresses to more dummy looking ones
Copy link
Contributor

@fiveangle fiveangle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100.100.100.100 is part of the 100.64.0.0/10 block which is specifically for Carrier Grade NAT to ensure no clobbering with private IP space. As per the RFC:

This address block should not be used on private networks or on the public Internet

To prevent problems for users, it's probably best to include only examples from the designated private IP ranges:

24-bit block 10.0.0.0 – 10.255.255.255 single class A network
20-bit block 172.16.0.0 – 172.31.255.255 16 contiguous class B networks
16-bit block 192.168.0.0 – 192.168.255.255 256 contiguous class C networks

More info:
https://en.wikipedia.org/wiki/IPv4_shared_address_space
https://en.wikipedia.org/wiki/Private_network

@mahaupt
Copy link
Author

mahaupt commented May 9, 2021

@fiveangle Thanks for taking a look! The IP adresses in the test logs are just used in the unit tests for representing a dummy remote harvesters and have no negative impact on the user.
Only filter IP addresses from the private range for the parser would result in not detecting harvesters connected from the internet. E.g. In a setup where several Harvesters in different networks are connected to a single farmer via the internet.

@amirdt22
Copy link

amirdt22 commented May 9, 2021

@cbacon93 Thanks! This will replace my custom notifications. I'm going to try to run this as well. Do you have any tips for how to determine which harvester "won" to help determine splits for private pools?

To detect simultanous failure of all remote harvesters (in case of a
network failure)
PR #83 Feature #65
@mahaupt
Copy link
Author

mahaupt commented May 11, 2021

@amirdt22 thanks, let me know if you find any issues!

@St3ffn
Copy link

St3ffn commented May 13, 2021

First of all thanks for your contibution. The feature seems promising.
I tested your feature branch in a setup with one remote farmer and two different harvesters using the same ip (of course they differ in the peer hash) . During the night my provider decided to perform some maintanance and I got a notification that the harvesters disappeared. So looks great and works for my setup 🚀

@McCloudS
Copy link

Working great so far here, Also getting alerts after 300+ seconds of no activity.

@fiveangle
Copy link
Contributor

fiveangle commented May 14, 2021

@cbacon93 - would you mind terribly rebasing onto 0.5.0 ? I've ran into problems on my rebase and it isn't clear where the problem is, but i can't seem to get it to work now :) (maybe squash all your updates into a single commit would resolve ?)

Loving the feature btw ! ❤️

mahaupt added 6 commits May 14, 2021 08:54
Storing remote harvesters in dict
Store ip_addr again if changed
Delete deactivated code
Change test ip addresses to more dummy looking ones
To detect simultanous failure of all remote harvesters (in case of a
network failure)
PR #83 Feature #65
@mahaupt
Copy link
Author

mahaupt commented May 14, 2021

@fiveangle Thanks, I just rebased it to 0.5.0. It worked but it was strange because I was not able to push afterwards. I should have used merge 😄
But I checked the code twice, it should work now

@mahaupt
Copy link
Author

mahaupt commented May 14, 2021

Here are the differences: main...cbacon93:feature-65

@fiveangle
Copy link
Contributor

fiveangle commented May 14, 2021

@cbacon93 - i usually just squash everything before a rebase on a PR only to keep things tidy and easy to review for the origin maintainer, and also because so often git gets confused with so many intermediary commits that change things in the same areas 😄

So i fired this up from b5b2800 and now I'm getting the false positive "harvester offline" notices that don't stop until restarting chiadog, then resume not long after. Have you noticed any issues on your side ? Granted this is on Windows, which has been problematic, but was working great at 3893bca which i was running for days w/o issue immediately before this.

UPDATE: I reverted back to vanilla 0.5.0 and the false-positive harvester offline issue does not occur. Something between 0.5.0 and b5b2800 is triggering the problem. It seems similar to the log rotation issue that was pretty much resolved with 3893bca. Maybe look at that to see if how it was handled might be affecting your feature at all ? Please let me know how I can help test further or provide any additional info !

image

@mahaupt
Copy link
Author

mahaupt commented May 15, 2021

@fiveangle thats weird, thanks for the info! What Harvester/Farmer setup do you have and do you get any python errors in the chiadog log?
Can you run the unittest to get some more info?
It this maybe related to #72 ? Im running it on three linux farmers without any issues, I'll try spinning up a windows vm.

@24601
Copy link

24601 commented May 15, 2021

@fiveangle thats weird, thanks for the info! What Harvester/Farmer setup do you have and do you get any python errors in the chiadog log?
Can you run the unittest to get some more info?
It this maybe related to #72 ? Im running it on three linux farmers without any issues, I'll try spinning up a windows vm.

I seem to have the same issue as @fiveangle and I do not see any Python errors in chiadog's log.

Mac OS X Big Sur 11.2.3, built off current main/master.

@fiveangle
Copy link
Contributor

fiveangle commented May 15, 2021

Unit tests seem to pass. Yes, as I mentioned in my post above, commit 3893bca appeared to have fully resolved #72 which is why I mentioned the fix for you to review to see if your feature falls outside of the methods in that commit to fix the log rotation. However, reviewing just now, it seems #73 may not have been fully fixed by 3893bca ? I can say that it appears to have fixed for me, but even if it rears its ugly head, the fact remains that with your PR, the very first log rotation fails every time with the false-positive harvester offline msg. With 3893bca (in 0.5.0) it at least works far more often than it doesn't (for me it hasn't skipped a beat but I guess for others it has). You can see in the screenshot below, within 20min of starting with your PR, the log rotates, and the false-positive starts. I then stopped chiadog, loaded 0.5.0 at it ran all night w/o a false-positive harvester error:

image

@martomi
Copy link
Owner

martomi commented May 22, 2021

@cbacon93 do you already have insights as to how much additional overhead on disk IO the DEBUG log level causes? Particularly interesting for devices such as the Raspberry.

@@ -21,14 +21,14 @@ class FarmerServerMessage:
class FarmerServerParser:
"""This class can parse info log messages from the chia farmer_server

You need to have enabled "log_level: INFO" in your chia config.yaml
You need to have enabled "log_level: DEBUG" in your chia config.yaml
Copy link
Contributor

@fiveangle fiveangle May 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ? is this actually required ? this will generate tons more IO on the boot drive which is already too much at INFO level. for plotters booting from USB (like all mine) this will likely cripple them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is required because of a change in log levels introduced in chia 1.1.6
https://github.com/Chia-Network/chia-blockchain/blob/1c808b6c2910ed32fdbfdfc576ba1bc5a5adeac9/chia/server/server.py#L492

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg, you're right... they tell us "use remote harvester" but then remove all ability to see if it's working ? sigh

how does the gui get this info ?

image

@fiveangle
Copy link
Contributor

fiveangle commented Jul 5, 2021

i've been running with the chia source reverted so these farmer logs from peers still go to INFO level because while setting at DEBUG, logs were pushing over 9GB/day on my system (~2300 plots, 5x remote harvesters). At INFO 2.4GB. I don't have a good answer, but I did make a comment in the source. I'm sure no one at ChiaCo will care, but what else can we do ? 🤷

Based on these changes, and the fact that ChiaCo does nothing but complain that we should use remote harvesters, then actively makes it harder/impossible to confirm remote harvesters are working and providing zero tools to do so, I can't say it gives me much confidence in the decisions being made higher up.

Back to chiadog, can we not merge it anyway ? I think the right solution is to add more detail on the instructions to say "be forewarned changing logging level to DEBUG will cause excessive logging, which may impact farmers with weak CPU, logging to SD or USB flash devices, etc", but also perhaps give a link to advanced detail on how the user can revert the chia-blockchain source to demote these messages back to INFO like I have ? (I suggest a link only to not muddy the instructions with complicated instructions that the average users won't need).

As it is now, not having this feature makes chiadog not really useful for anyone using remote harvesters unless they deal with managing chiadog on every remote harvester too (quite a pain by comparison). Plus having it wouldn't affect anyone who isn't using remote harvesters.

Thanks again for trying to make this work @mahaupt !

@mahaupt
Copy link
Author

mahaupt commented Jul 8, 2021

You can now get remote harvester infos by rpc calls. Maybe this will make our problems obsolete 😅

@martomi
Copy link
Owner

martomi commented Jul 9, 2021

Yes, I was also going to suggest the RPC route, log parsing is not the best long-term strategy, particularly for this use-case. Perhaps there's already chiadog alternative using that approach?

I know @pieterhelsen was also entertaining the idea of pluggable architecture which could allow writing plugins (e.g. RPC monitoring) for chiadog in a more decentralised way.

@fiveangle
Copy link
Contributor

Perhaps there's already chiadog alternative using that approach?

chiadog alternative ❓

@mahaupt mahaupt closed this Jan 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants