-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jsonrpc autoconnect, indexes client recovery, docker volumes, bugfixes #68
Conversation
fc77b79
to
680954b
Compare
cmd/server/main.go
Outdated
@@ -78,3 +81,9 @@ func instrumentationSetup() { | |||
} | |||
}() | |||
} | |||
|
|||
func pprofSetup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabled pprof. I wanted to inspect some things about a memprofile of the service.
All good. Left it here anyways, is binded to localhost
so not public.
volumes: | ||
- textile-fc-data:/root/.texfc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a change in the docker-compose used to launch the server, prometheus, grafana, etc.
The textile-fc
(our service) uses a volume to map the repo folder to a Docker volume. Mostly for saving progress while restarting the compose environment, and tests index persistence.
@@ -523,7 +523,7 @@ | |||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some improvements in clarity and new boxes that helped discover some things.
@@ -151,33 +150,3 @@ func (mi *MinerIndex) persistMetaIndex(index MetaIndex) error { | |||
} | |||
return nil | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to miners.go
since loading from store is a general thing.
@@ -153,10 +154,11 @@ func (mi *MinerIndex) Close() error { | |||
// miners. | |||
func (mi *MinerIndex) start() { | |||
defer func() { mi.finished <- struct{}{} }() | |||
n, err := mi.api.ChainNotify(mi.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop using ChainNotify
for listening to the chain.
I've been thinking about this for quite some time since puts a lot of stress. It notifies about every possible forks that might happen in 10-20s time frame... we're only interested in the heaviest tipset known and let time (and lotus) decide which are the heaviest ones.
Switched to waiting the avg tipset time and just asking for the heaviest current tipset (which is what we need). Mini forks or possible future forks aren't interesting. We don't need to be at the bleeding-edge of that.
if len(info.Epochs) > 0 && info.Epochs[len(info.Epochs)-1] == patch[addr] { | ||
continue | ||
} | ||
info.Epochs = append(info.Epochs, patch[addr]) | ||
index.Miners[addr] = info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix. info
was edited but not assigned to current index (so lost changes).
@@ -61,7 +61,7 @@ func New(maddr ma.Multiaddr, authToken string) (*API, func(), error) { | |||
closer, err := jsonrpc.NewMergeClient("ws://"+addr+"/rpc/v0", "Filecoin", | |||
[]interface{}{ | |||
&api.Internal, | |||
}, headers) | |||
}, headers, jsonrpc.WithReconnect(true, time.Second*3, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original PR that I did for lotus. You can configure a interval time to wait so that the websocket connection is recovered, adn the client can still use it as "nothing happened".
In the newer PR that will get merged, you can annotate with retry:"true"
methods of the API. If that annotation is detected, the client retries itself the call until having success on a recovered connection. This makes the client even do less work since shouldn't do any kind of retry logic.
I may eventually consider switching to the new solution.
@@ -242,16 +242,10 @@ func monitorLotusSync(ctx context.Context, c *API) { | |||
} | |||
|
|||
func refreshHeightMetric(ctx context.Context, c *API) { | |||
var h uint64 | |||
state, err := c.SyncState(ctx) | |||
heaviest, err := c.ChainHead(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify curent height
metric in grafana dashboard to the current heaviest tipset... not the newer potential new tipset that we discovered recently.
@@ -85,12 +85,21 @@ type client struct { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lotus/jsonrpc
package is from Lotus. Has the changes proposed for autoconnect feature.
Can be dismissed for review... but if like crazy things might be fun.
|
||
ma "github.com/multiformats/go-multiaddr" | ||
) | ||
|
||
const ( | ||
AvgBlockTime = time.Second * 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In testnet is 45s. But in mainet will be 30s as mentioned in the spec.
Leaving 30s here since updating more often doesn't hurt; worst thing that can happen is doing nothing.
Will DCO after review just in case that might not invalidate review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some extra things added.
@@ -50,6 +49,7 @@ func (mi *MinerIndex) metaWorker() { | |||
newIndex, err := updateMetaIndex(mi.ctx, mi.api, mi.h, mi.lr, addrs) | |||
if err != nil { | |||
log.Errorf("error when updating meta index: %s", err) | |||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for some reason generating a new index fails, log the error and break here. If we continue we would override the current-good-state-index with an empty one and lost things. Better to keep it outdated than to lose it.
Online: mi.index.Meta.Online, | ||
Offline: mi.index.Meta.Offline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These copies were missing so didn't appear correctly in the output.
Signed-off-by: jsign <jsign.uy@gmail.com>
Signed-off-by: jsign <jsign.uy@gmail.com>
Signed-off-by: jsign <jsign.uy@gmail.com>
Signed-off-by: jsign <jsign.uy@gmail.com>
910c3c0
to
e765a81
Compare
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty unfamiliar with the details here, but didn't see anything that looks terribly wrong. Merge away!
While running our reputation index, which is a long-running thing, the lotus client was kind of fatal when the websocket connection was lost. This was a known thing for a while in the
jsonrpc
from lotus, so I jumped to help us both in that (filecoin-project/lotus#1155).The current solution that is in this PR, is more in line with a previous PR than the mentioned before... so it may change again with my new solution, but the current one is enough for our interests.
While installing and leaving running the service in the cloud instance, I discovered a set of issues or needed improvements to keep it more: reliable on lotus or self crashes, tested the persistence of indexed data and recovery after restarting, and some extra tunes to dockerized things that run on the cloud (and I also use locally to test things, since helps fast iteration with the many wires to connect).
I'll make some comments, but that's pretty much the summary of this.
Closes #41
Closes #67