-
Notifications
You must be signed in to change notification settings - Fork 34
Miner can't connect to peers. not_found errors in logs. Results in hotspot not witnessing anything. #417
Comments
Hi @thardie, I believe my original helium hotspot has the same libp2p issue. It has been days since I Sent Beacon and I see in All Activity from explorer that I have "Created Challenge" but no "Challenged Beaconer" activity (assuming this is where p2p network issue drops/loses challenge packet). How are you able to view your hotspot logs? Any information/resources that you can share to help me setup something similar/implement your workaround? |
So the code already does this for you:
|
how can i apply this fix? i have pisces p100 |
I believe I am having the same issue... +---------+-------+ Miner log errors ***** Transaction related errors ***** : 2022-01-02 15:37:35.249 9 [error] <0.5546.0>@blockchain_txn_dialer:dial_:{142,21} libp2p_framed_stream dial failed. Reason: not_found, To: "/p2p/11VJFvyQrHWM3BdeYsPeH4HGQW6PUY8Ztjwiq78gKp5sWepffpK", TxnHash: <<242,180,4,149,240,138,217,109,124,197,86,236,137,117,134,228,222,11,54,63,95,69,6,14,220,136,177,177,115,52,80,78>> ***** POC related errors ***** : 2022-01-02 15:51:21.326 9 [error] <0.1641.0>@miner_poc_statem:send_onion:{1003,13} failed to dial 1st hotspot ("/p2p/11q1XjwvU2zNYiPoAiYvrai3VWwibBtUYXTLoPsNTjnqTunzcWA"): not_found ***** General errors *****
|
Could you please (1) supply the hotspot address or name for this one and (2) see if you can get more details from the logs for the "general errors you paster there? |
The miner name is Noisy Pine Seal.
|
well, I have attached the logs from var/data/log in a file below, its error.log error.log.1 error.log.0 , from console cat command |
You have the not_found bug. You should be able to use the python script linked above to work around your issue. It just needs to be able to read the console log and be able to find a docker container named "miner". |
It depends on your specific hotspot implementation. You need to be able to get shell access to it. Commonly through SSH. If your miner doesn't have SSH access, you have to modify it's image to allow SSH (This is what I did on my Helium OG hotspot, since it's just a Raspberry pi under the hood). |
It's very hard for me to be sure it works. Before I did anything, I'd gone down to zero witnesses. I added this and got up higher than I had before. I've given the script to a bunch of other people who also report that it helps (it certainly doesn't solve all not_found cases). So, my answer is: Anecdotally, it helps. Not sure why, but I also can't grok Erlang. I do fine with C, C++, Python, Javascript, Golang, etc. Erlang is just incomprehensible to me for some reason. |
@thardie could it be a versioning issue with erlang and the hotspot OS? when I originally tried to compile the gateway-rs for the light hotspot ( thought it would work with my EEC key plugged into it, I was told otherwise) but when compiling it on raspberry pi 1 b+ 32 bit on legacy raspberry pi OS buster ( successfully) , I also compiled it on raspberry pi 4 64 bit raspberry pi OS bullseye . ( also successfully) , I had run into an issue on the raspberry pi OS Debian Bullseye. I would have to back track my process... but the Erlang repo in various " walk troughs" the Erlang repo that gets put into the APT sources is for Buster 64 bit... and It would fail on Bullseye. I found another "walk through" that was for Bullseye specifically, Then did the whole Cargo thing and it compiled... I kinda left it there since... Then I found out I couldnt use Gateway-RS yet for PoC so I just followed the docker integration helium doc and used the prebuilt container image for regular miner....
|
This doesn't explain hotspots who are working fine that spontaneously stop working. Also, even with my workaround, refreshing a peer doesn't always work. I can prove the peer exists, yet doing a refresh still doesn't find the peer. I also got gateway-rs to work, but it doesn't support witnessing yet, so not much point in running it. |
@thardie the only reason I mentioned it, was the fact that all the different manufactures have their OS set up differently, Im sure they all run some sort of docker container system, but I have been trying to get my aftermarket concentrator sx1302 working with my original nebra indoor miner. I didnt want to delete the OS and so on, So I started with using my spare raspis , to get the container working * I don't know much about docker or coding* so, I just mentioned it in the case it may be useful to someone that knows about it. What I noticed, Is some manufactures have several containers doing different functions like the Dbus, gateway, networking, packet forwarder etc... I noticed with nebra they have a lot of their modules broken out into python scripts. but others may not. Could it be that by just changing the one container of the helium miner 12/29/2021 and leaving the rest as is, cause some sort of conflict with the Erlang, IF the Erlang repo is being linked or sourced from another container? My miner isnt using any nebra software just the " guts" with a separate raspi jumper wired into the main board. everything works... just the "bug" so Im not saying nebra software is having this bug. but the official helium miner pull from quay is. the only container I have running is that one official helium. The concentrator is running as per the documentation at Lora-net sx1302_hal which is just a normal ./lora_pkt_fwd -c global_..... as far as the concentrator talking to the container. Its working... |
OK so I still haven't added my EEC Hardware to the AMD miner container on my laptop, I'm letting it catch up to the chain... but its not throwing any errors. I re installed the arm64-latest keeping the same directory as before * same process as it explains in the helium documentation for updating, I added the --mount bind option with a bind-propagation=shared to see if that would change anything, but Im still pulling the same errors on that line 112 bad argument.. but 0 errors on the AMD version... not sure who ever has the Erlang repo for AMD might want to try and compile the ARM64 version or compare the two... I would do it Eventually I will... but its all Chinese to me ( no offense) |
The whole idea with containers is that they are completely isolated from each other and isolated from the host OS (mostly, but specifically they don't use the host OS's or other containers libraries), so there isn't a way 1 container could mess up another container's erlang environment. The big thing you'll need if you run something like miner or gateway-rs inside a container is to map through /dev/i2c-1 from the hostOS (For Pisces miners, you need to map /dev/i2c-0 from the host OS to /dev/i2c-1 inside the container). If you're running something in a container that needs to talk to the LoRaWan module, you need to map through /dev/spidev0.0 and /dev/spidev0.1 and allow the container to do RAWIO. Check here for details on how each of these containers is set up: |
Ill check that out today. I understand the containers are "self contained" but if you got one self contained thing talking to another expecting a specific response, then it could cause a conflict. I noticed there was a new validator update sometime in the past 24 hours. as far as the " Peer bug" maybe the issue isnt on the hotspot end it could be on the validators end. My concentrator has 3 lights on it that turn on when a beacon happens. and it in fact turns on. Corresponds with the message inthe lora_pkt_fwd terminal. But I did get witnessed 1 time out of maybe 5 beacons within the past 36 hours. So the antenna is working the concentrator is working. The issue is if all my witnesses cant report their "witnessing" to the network cus of the bug. Its going to keep the network bogged down. What iim not sure of is the reassert doesnt happen over LoRa or the radios... its happening over the internet or p2p network... ( I think) but we still get charged dc for that. IF in fact it is over the radio, then I have successfully sent the update 2x *** the AMD version of the image finished syncing. The only error I am getting on that is a failed to dial a proxy server not the Snapshot issue. |
As far as the "peer bug" I have not been seeing any peer issues recently my latest info summary... is having snapshot issues on the ARM64.. not sure if this is related or not. ***** General errors ***** : 2022-01-04 18:17:11.668 7 [error] <0.17967.1>@blockchain_snapshot_handler_pb:encode_msg_blockchain_snapshot_resp_pb:122 gen_server <0.17967.1> terminated with reason: bad argument in call to erlang:iolist_size({file,<<"/var/data/saved-snaps/snap-72b1b162d99abc12d4e5244d39312383e33c9d732f221d3bf13f1898ddc6c5...">>}) in blockchain_snapshot_handler_pb:encode_msg_blockchain_snapshot_resp_pb/3 line 122 |
Hi @thardie. I am having exactly same errors for my hotspot "https://explorer.helium.com/hotspots/112Y2WVurWxWZsegNMCKGe5Ty7TmD4U57Xb3oRwU3zxbTiyJDLRi". I think the script provided by you may be super useful to me. However, unfortunately, I am completely new to python coding and hence, unable to execute the script. 2022-04-09 20:06:52.858 7 [error] <0.1832.0>@miner_poc_statem:send_onion:{1022,13} failed to dial 1st hotspot ("/p2p/112vK3Kfc3t8WriUg14vbodjNaozphF9gL65YXUGMYSofpD59h2h"): not_found 2022-04-09 20:07:02.862 7 [error] <0.1832.0>@miner_poc_statem:send_onion:{1022,13} failed to dial 1st hotspot ("/p2p/112vK3Kfc3t8WriUg14vbodjNaozphF9gL65YXUGMYSofpD59h2h"): not_found 2022-04-09 20:07:12.870 7 [error] <0.1832.0>@miner_poc_statem:send_onion:{1022,13} failed to dial 1st hotspot ("/p2p/112vK3Kfc3t8WriUg14vbodjNaozphF9gL65YXUGMYSofpD59h2h"): not_found 2022-04-09 20:07:22.874 7 [error] <0.1832.0>@miner_poc_statem:handle_challenging:{579,29} failed to dial 1st hotspot ("/p2p/112vK3Kfc3t8WriUg14vbodjNaozphF9gL65YXUGMYSofpD59h2h"): retries_exceeded Any further guidance from you regarding this would be highly useful to me. |
Heh wrong place for this, but where you downloaded the script , go to the issues page there, I kinda made a small write up on it… I’m not a coder but, I changed that script to work better. And made a small write up on it since they have no documentation . |
Many thanks @serbyxp! Will try this... |
The refresh thing works, but not as good as my changes, instead of it doing a peer refresh , I changed it to do a peer connect. I mentioned this to helium devs. Not the script, but by using that script, I figured out that the way the miners try and send messages to each other Is an issue, almost everyone has this issue. But I mentioned to them instead of just sending the receipt to the /p2p/Addr that a connection needs to be established first, then the message be sent. But in like 6 days light hot spot goes live so libp2p is done , and all receipts etc… are handled by validators… so you might just want to wait the 6 days and avoid the headache of learning all that if you don’t understand it. The gRPC should work better . Keyword Should. |
Hi,
Just started to get into the docker thing, to which I am completely new!. I
was already into its user manual while it's being installed. Then I got
your second email reminding me about the launch of light hot spot😃. I
realised that while I have already waited for over a month, I should wait
for 6 more days especially if the problem will most likely be solved
without getting into this completely new and advanced docker thing.
Anyway, many thanks for your response and willingness to help!
…On Sun, Apr 10, 2022 at 6:15 PM serbyxp ***@***.***> wrote:
The refresh thing works, but not as good as my changes, instead of it
doing a peer refresh , I changed it to do a peer connect. I mentioned this
to helium devs. Not the script, but by using that script, I figured out
that the way the miners try and send messages to each other Is an issue,
almost everyone has this issue. But I mentioned to them instead of just
sending the receipt to the /p2p/Addr that a connection needs to be
established first, then the message be sent. But in like 6 days light hot
spot goes live so libp2p is done , and all receipts etc… are handled by
validators… so you might just want to wait the 6 days and avoid the
headache of learning all that if you don’t understand it. The gRPC should
work better . Keyword Should.
—
Reply to this email directly, view it on GitHub
<#417 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALK3UIQ6VTAFHRHGKBNH5PTVEL5DVANCNFSM5K3II7BQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Log lines for failure to connect to challenger and proxys:
2021-12-19 20:38:40.744 1 [error] <0.1560.0>@libp2p_transport_proxy:connect_to:69 failed to dial proxy server "/p2p/1123LJMnvys68Fab5GkA32svQ9u4LdVC396nXUgeesDbbR6o9j8f" not_found
and:
2021-12-19 20:37:54.488 1 [warning] <0.889.1>@miner_onion_server:send_witness:243 failed to dial challenger "/p2p/112JU7o2BDmPYw9v7amWEtk7btSM52gHuEAyj3EcXPYqFB8xsgYj": not_found
This appears to be a bad cache in the local peer book, and can be resolved by calling:
docker exec miner miner peer refresh /p2p/<peerid>
I wrote a python script that monitors the logs and issues this command immediately, and miner retries to connect and is able to witness again (https://github.com/HeliumDIY/helium_workarounds/blob/main/src/fix_not_found_peer.py)
The text was updated successfully, but these errors were encountered: