Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS Infrastructure Status Page #82

Open
4 tasks
olizilla opened this issue Jul 3, 2019 · 9 comments
Open
4 tasks

IPFS Infrastructure Status Page #82

olizilla opened this issue Jul 3, 2019 · 9 comments
Labels
dif/medium Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week kind/maintenance Work required to avoid breaking changes or harm to project's status quo P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month topic/design-front-end Front-end implementation of UX/UI work topic/design-visual Visual design ONLY, not part of a larger UX effort

Comments

@olizilla
Copy link
Member

olizilla commented Jul 3, 2019

Related to #80 we need a more holistic overview of the health of the ipfs.io infrastucture. We want to visualise how things are running in a way that give a clear overview at the top level, and lets you drill into more info for each specific service and linking out to other telemetry services (netdata, grafana) where sensible to give the full details.

A status page of some sort has been suggested... popular public ones include

Some open source solutions

TODO:

  • Define the list of services
    • gateway nodes
    • bootstrap nodes
    • dhtbooster nodes
    • preload nodes
    • websocket-star and webrtc signalling infra
    • nginx / http frontend
    • certbot / tls
    • DNS / dnsimple
    • ?
  • Define regions, zones, datacenters
    • packet
    • where tho?
  • Define metrics
    • Gateway / Nginx requests over time (current gateway load)
    • nginx timeouts over time (# requests for undiscoverable content)
    • IPFS response time for local blocks
    • IPFS response time for blocks from cluster
    • IPFS response time for DHT discovery
    • Estimated unique peerIDs in network
    • total bandwidth and average bandwidth per request.
    • total infra cost?
  • Define status thresholds
    • Happy fail: requests are slow because we are getting way more than usual
    • Sad fail: requests are slow becuase something is broken... DHT discovery time just spiked, but number of unique peers didn't
    • Budget exceeded: we hit a cost threshold and started throttling specific services.
@olizilla
Copy link
Member Author

@olizilla
Copy link
Member Author

Interestingly github and circle both use https://statuspage.io
I am currently trying out https://docs.statusfy.co

@olizilla
Copy link
Member Author

olizilla commented Jul 11, 2019

Here's how things could look if we go for https://statuspage.io

User view

New Incident Resolved Details
Screenshot 2019-07-11 at 13 18 53 Screenshot 2019-07-11 at 13 20 20 Screenshot 2019-07-11 at 13 20 26

Operator view

New incident Resolved Details
Screenshot 2019-07-11 at 13 18 27 Screenshot 2019-07-11 at 13 19 41 Screenshot 2019-07-11 at 13 19 53

All OK

Screenshot 2019-07-11 at 13 17 36

@jessicaschilling
Copy link
Contributor

I really want those health meters to pulse in a Knight Rider kind of way, but otherwise this is really nifty!

@olizilla
Copy link
Member Author

I also tired out:

  • https://www.sorryapp.com/ - cheaper than statuspage.io but didn't feel as intuitive... something about it didn't click for me
New Incident View incident
Screenshot 2019-07-12 at 10 41 45 Screenshot 2019-07-12 at 10 40 39

224b78df sorryapp com_ (1)

@olizilla
Copy link
Member Author

Screenshot 2019-07-12 at 10 39 07

The cli builds out a static site, and then it's up to us where we want to publish it.

localhost_8000_

This could let us host it on IPFS, but I'm assuming that the network status page is the one resource we should not post on IPFS itself. We can of course host it on any static resoruce server. I've not explored it further as it seems like we'd want to have a very comfortable and clear UI for reporting incidents, as those situations are stressful enough. Creating a static site is a reliable process, an could be entirely automated via github, but I want to check in with the operators who are using it to see what there prefences are.

@jessicaschilling
Copy link
Contributor

jessicaschilling commented Jul 12, 2019

This could let us host it on IPFS, but I'm assuming that the network status page is the one resource we should not post on IPFS itself.

😆 Agreed.

@hsanjuan
Copy link
Contributor

Main storage Cluster is missing from the services list (although I see it in your screenshots).

Also, Pinbots.

@momack2
Copy link

momack2 commented Aug 2, 2019

both statuspage and statusfy seem reasonable. are there other benefits to the self hosted version we like? ex the markdown or cli integrations?

@jessicaschilling jessicaschilling added dif/medium Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week kind/maintenance Work required to avoid breaking changes or harm to project's status quo P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month topic/design-front-end Front-end implementation of UX/UI work topic/design-visual Visual design ONLY, not part of a larger UX effort labels Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/medium Prior experience is likely helpful effort/days Estimated to take multiple days, but less than a week kind/maintenance Work required to avoid breaking changes or harm to project's status quo P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month topic/design-front-end Front-end implementation of UX/UI work topic/design-visual Visual design ONLY, not part of a larger UX effort
Projects
No open projects
Status: Needs Grooming
Development

No branches or pull requests

4 participants