Feature: notification for disappearing remote harvester #83

mahaupt · 2021-05-05T15:04:12Z

I added the feature #65
Let me know what you think.
I'll upload this to my farmer for long term testing

It parses the log for farmer_server entries that indicate incoming data from harvesters.
It saves the last timestamp, the peer hash and the ip address in a list. If one harvester doesnt send any data for more than 300s, it sends a notification and removes the harvester from the list.

martomi · 2021-05-05T19:47:59Z

Loving that you managed to extend chiadog to your particular use-case!

I'll have a more detailed look on the weekend. For merging, we'll need to consider how generalisable this would be to more chiadog users compared to the overhead in the maintenance required. Seems it has the potential to increase support requests significantly due to the sheer quantity of possible combination of setups this makes possible 😄

mahaupt · 2021-05-05T20:14:45Z

No worries. I added this primarily because we has several unnoticed outages last week due to windows force-rebooting 😄 But I would be glad to share this if found useful to the project.

fiveangle · 2021-05-07T03:11:05Z

this is awesome, thank you so much ! the chia offical app does absolutely nothing to help diagnose or even alert us of these problems. will pop it on my farmer as well and give it a go ! 🥇

martomi

Looks like a useful feature but I think it'll need a bit more work if we want to merge it. Perhaps also configuration options to enable/disable. But I think this will work transparently also with local harvesters so maybe not necessary.

src/chia_log/handlers/condition_checkers/remote_harvester_activity.py

src/chia_log/handlers/farmer_server_handler.py

tests/chia_log/handlers/test_farmer_server_handler.py

Storing remote harvesters in dict Store ip_addr again if changed Delete deactivated code Change test ip addresses to more dummy looking ones

fiveangle

100.100.100.100 is part of the 100.64.0.0/10 block which is specifically for Carrier Grade NAT to ensure no clobbering with private IP space. As per the RFC:

This address block should not be used on private networks or on the public Internet

To prevent problems for users, it's probably best to include only examples from the designated private IP ranges:

24-bit block	10.0.0.0 – 10.255.255.255	single class A network
20-bit block	172.16.0.0 – 172.31.255.255	16 contiguous class B networks
16-bit block	192.168.0.0 – 192.168.255.255	256 contiguous class C networks

More info:
https://en.wikipedia.org/wiki/IPv4_shared_address_space
https://en.wikipedia.org/wiki/Private_network

mahaupt · 2021-05-09T12:00:43Z

@fiveangle Thanks for taking a look! The IP adresses in the test logs are just used in the unit tests for representing a dummy remote harvesters and have no negative impact on the user.
Only filter IP addresses from the private range for the parser would result in not detecting harvesters connected from the internet. E.g. In a setup where several Harvesters in different networks are connected to a single farmer via the internet.

amirdt22 · 2021-05-09T18:28:55Z

@cbacon93 Thanks! This will replace my custom notifications. I'm going to try to run this as well. Do you have any tips for how to determine which harvester "won" to help determine splits for private pools?

To detect simultanous failure of all remote harvesters (in case of a network failure) PR #83 Feature #65

mahaupt · 2021-05-11T08:59:12Z

@amirdt22 thanks, let me know if you find any issues!

St3ffn · 2021-05-13T10:29:06Z

First of all thanks for your contibution. The feature seems promising.
I tested your feature branch in a setup with one remote farmer and two different harvesters using the same ip (of course they differ in the peer hash) . During the night my provider decided to perform some maintanance and I got a notification that the harvesters disappeared. So looks great and works for my setup 🚀

McCloudS · 2021-05-13T17:42:49Z

Working great so far here, Also getting alerts after 300+ seconds of no activity.

fiveangle · 2021-05-14T06:06:40Z

@cbacon93 - would you mind terribly rebasing onto 0.5.0 ? I've ran into problems on my rebase and it isn't clear where the problem is, but i can't seem to get it to work now :) (maybe squash all your updates into a single commit would resolve ?)

Loving the feature btw ! ❤️

Storing remote harvesters in dict Store ip_addr again if changed Delete deactivated code Change test ip addresses to more dummy looking ones

To detect simultanous failure of all remote harvesters (in case of a network failure) PR #83 Feature #65

mahaupt · 2021-05-14T07:13:06Z

@fiveangle Thanks, I just rebased it to 0.5.0. It worked but it was strange because I was not able to push afterwards. I should have used merge 😄
But I checked the code twice, it should work now

mahaupt · 2021-05-14T07:16:27Z

Here are the differences: main...cbacon93:feature-65

fiveangle · 2021-05-14T15:07:02Z

@cbacon93 - i usually just squash everything before a rebase on a PR only to keep things tidy and easy to review for the origin maintainer, and also because so often git gets confused with so many intermediary commits that change things in the same areas 😄

So i fired this up from b5b2800 and now I'm getting the false positive "harvester offline" notices that don't stop until restarting chiadog, then resume not long after. Have you noticed any issues on your side ? Granted this is on Windows, which has been problematic, but was working great at 3893bca which i was running for days w/o issue immediately before this.

UPDATE: I reverted back to vanilla 0.5.0 and the false-positive harvester offline issue does not occur. Something between 0.5.0 and b5b2800 is triggering the problem. It seems similar to the log rotation issue that was pretty much resolved with 3893bca. Maybe look at that to see if how it was handled might be affecting your feature at all ? Please let me know how I can help test further or provide any additional info !

mahaupt · 2021-05-15T10:59:12Z

@fiveangle thats weird, thanks for the info! What Harvester/Farmer setup do you have and do you get any python errors in the chiadog log?
Can you run the unittest to get some more info?
It this maybe related to #72 ? Im running it on three linux farmers without any issues, I'll try spinning up a windows vm.

24601 · 2021-05-15T18:38:14Z

@fiveangle thats weird, thanks for the info! What Harvester/Farmer setup do you have and do you get any python errors in the chiadog log?
Can you run the unittest to get some more info?
It this maybe related to #72 ? Im running it on three linux farmers without any issues, I'll try spinning up a windows vm.

I seem to have the same issue as @fiveangle and I do not see any Python errors in chiadog's log.

Mac OS X Big Sur 11.2.3, built off current main/master.

fiveangle · 2021-05-15T19:14:25Z

Unit tests seem to pass. Yes, as I mentioned in my post above, commit 3893bca appeared to have fully resolved #72 which is why I mentioned the fix for you to review to see if your feature falls outside of the methods in that commit to fix the log rotation. However, reviewing just now, it seems #73 may not have been fully fixed by 3893bca ? I can say that it appears to have fixed for me, but even if it rears its ugly head, the fact remains that with your PR, the very first log rotation fails every time with the false-positive harvester offline msg. With 3893bca (in 0.5.0) it at least works far more often than it doesn't (for me it hasn't skipped a beat but I guess for others it has). You can see in the screenshot below, within 20min of starting with your PR, the log rotates, and the false-positive starts. I then stopped chiadog, loaded 0.5.0 at it ran all night w/o a false-positive harvester error:

martomi · 2021-05-22T06:50:37Z

@cbacon93 do you already have insights as to how much additional overhead on disk IO the DEBUG log level causes? Particularly interesting for devices such as the Raspberry.

fiveangle · 2021-05-22T09:54:20Z

src/chia_log/parsers/farmer_server_parser.py

@@ -21,14 +21,14 @@ class FarmerServerMessage:
 class FarmerServerParser:
    """This class can parse info log messages from the chia farmer_server

-    You need to have enabled "log_level: INFO" in your chia config.yaml
+    You need to have enabled "log_level: DEBUG" in your chia config.yaml


why ? is this actually required ? this will generate tons more IO on the boot drive which is already too much at INFO level. for plotters booting from USB (like all mine) this will likely cripple them

Unfortunately, this is required because of a change in log levels introduced in chia 1.1.6
https://github.com/Chia-Network/chia-blockchain/blob/1c808b6c2910ed32fdbfdfc576ba1bc5a5adeac9/chia/server/server.py#L492

omg, you're right... they tell us "use remote harvester" but then remove all ability to see if it's working ? sigh

how does the gui get this info ?

fiveangle · 2021-07-05T02:25:42Z

i've been running with the chia source reverted so these farmer logs from peers still go to INFO level because while setting at DEBUG, logs were pushing over 9GB/day on my system (~2300 plots, 5x remote harvesters). At INFO 2.4GB. I don't have a good answer, but I did make a comment in the source. I'm sure no one at ChiaCo will care, but what else can we do ? 🤷

Based on these changes, and the fact that ChiaCo does nothing but complain that we should use remote harvesters, then actively makes it harder/impossible to confirm remote harvesters are working and providing zero tools to do so, I can't say it gives me much confidence in the decisions being made higher up.

Back to chiadog, can we not merge it anyway ? I think the right solution is to add more detail on the instructions to say "be forewarned changing logging level to DEBUG will cause excessive logging, which may impact farmers with weak CPU, logging to SD or USB flash devices, etc", but also perhaps give a link to advanced detail on how the user can revert the chia-blockchain source to demote these messages back to INFO like I have ? (I suggest a link only to not muddy the instructions with complicated instructions that the average users won't need).

As it is now, not having this feature makes chiadog not really useful for anyone using remote harvesters unless they deal with managing chiadog on every remote harvester too (quite a pain by comparison). Plus having it wouldn't affect anyone who isn't using remote harvesters.

Thanks again for trying to make this work @mahaupt !

mahaupt · 2021-07-08T04:50:06Z

You can now get remote harvester infos by rpc calls. Maybe this will make our problems obsolete 😅

martomi · 2021-07-09T20:11:43Z

Yes, I was also going to suggest the RPC route, log parsing is not the best long-term strategy, particularly for this use-case. Perhaps there's already chiadog alternative using that approach?

I know @pieterhelsen was also entertaining the idea of pluggable architecture which could allow writing plugins (e.g. RPC monitoring) for chiadog in a more decentralised way.

fiveangle · 2021-07-10T04:23:15Z

Perhaps there's already chiadog alternative using that approach?

chiadog alternative ❓

mahaupt added 3 commits April 29, 2021 12:33

Add farmer_server parser: #65

38ac244

Add handler and condition checker for #65

0134726

Add unit tests and bugfixes for farmer_server #65

8586d83

martomi reviewed May 8, 2021

View reviewed changes

Fix issues according to PR Review #83

bf28871

Storing remote harvesters in dict Store ip_addr again if changed Delete deactivated code Change test ip addresses to more dummy looking ones

fiveangle reviewed May 9, 2021

View reviewed changes

Add Keepalive events to remote harvester events

c78c7ce

To detect simultanous failure of all remote harvesters (in case of a network failure) PR #83 Feature #65

mahaupt added 6 commits May 14, 2021 08:54

Add farmer_server parser: #65

de4efa2

Rebasing with 0.5.0

54a77e0

Add unit tests and bugfixes for farmer_server #65

04dc557

Fix issues according to PR Review #83

ac549f6

Storing remote harvesters in dict Store ip_addr again if changed Delete deactivated code Change test ip addresses to more dummy looking ones

Add Keepalive events to remote harvester events

6b6d406

To detect simultanous failure of all remote harvesters (in case of a network failure) PR #83 Feature #65

Merge branch 'feature-65' of github.com:cbacon93/chiadog into feature-65

b5b2800

mahaupt added 2 commits May 22, 2021 07:58

Update parser for chia 1.1.6

9670160

Update disappearing_harvester.txt

d57b079

mahaupt added 2 commits May 22, 2021 07:59

Update nominal.txt

cf3b447

Merge branch 'martomi:main' into feature-65

663229b

fiveangle reviewed May 22, 2021

View reviewed changes

mahaupt closed this Jan 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: notification for disappearing remote harvester #83

Feature: notification for disappearing remote harvester #83

mahaupt commented May 5, 2021 •

edited

Loading

martomi commented May 5, 2021

mahaupt commented May 5, 2021

fiveangle commented May 7, 2021

martomi left a comment

fiveangle left a comment •

edited

Loading

mahaupt commented May 9, 2021

amirdt22 commented May 9, 2021

mahaupt commented May 11, 2021

St3ffn commented May 13, 2021

McCloudS commented May 13, 2021

fiveangle commented May 14, 2021 •

edited

Loading

mahaupt commented May 14, 2021 •

edited

Loading

mahaupt commented May 14, 2021

fiveangle commented May 14, 2021 •

edited

Loading

mahaupt commented May 15, 2021

24601 commented May 15, 2021

fiveangle commented May 15, 2021 •

edited

Loading

martomi commented May 22, 2021

fiveangle May 22, 2021 •

edited

Loading

mahaupt May 22, 2021

fiveangle May 25, 2021

fiveangle commented Jul 5, 2021 •

edited

Loading

mahaupt commented Jul 8, 2021

martomi commented Jul 9, 2021

fiveangle commented Jul 10, 2021

Feature: notification for disappearing remote harvester #83

Feature: notification for disappearing remote harvester #83

Conversation

mahaupt commented May 5, 2021 • edited Loading

martomi commented May 5, 2021

mahaupt commented May 5, 2021

fiveangle commented May 7, 2021

martomi left a comment

Choose a reason for hiding this comment

fiveangle left a comment • edited Loading

Choose a reason for hiding this comment

mahaupt commented May 9, 2021

amirdt22 commented May 9, 2021

mahaupt commented May 11, 2021

St3ffn commented May 13, 2021

McCloudS commented May 13, 2021

fiveangle commented May 14, 2021 • edited Loading

mahaupt commented May 14, 2021 • edited Loading

mahaupt commented May 14, 2021

fiveangle commented May 14, 2021 • edited Loading

mahaupt commented May 15, 2021

24601 commented May 15, 2021

fiveangle commented May 15, 2021 • edited Loading

martomi commented May 22, 2021

fiveangle May 22, 2021 • edited Loading

Choose a reason for hiding this comment

mahaupt May 22, 2021

Choose a reason for hiding this comment

fiveangle May 25, 2021

Choose a reason for hiding this comment

fiveangle commented Jul 5, 2021 • edited Loading

mahaupt commented Jul 8, 2021

martomi commented Jul 9, 2021

fiveangle commented Jul 10, 2021

mahaupt commented May 5, 2021 •

edited

Loading

fiveangle left a comment •

edited

Loading

fiveangle commented May 14, 2021 •

edited

Loading

mahaupt commented May 14, 2021 •

edited

Loading

fiveangle commented May 14, 2021 •

edited

Loading

fiveangle commented May 15, 2021 •

edited

Loading

fiveangle May 22, 2021 •

edited

Loading

fiveangle commented Jul 5, 2021 •

edited

Loading