Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete diagnostics support #3969

Merged
merged 3 commits into from
Sep 4, 2015
Merged

Complete diagnostics support #3969

merged 3 commits into from
Sep 4, 2015

Conversation

otoolep
Copy link
Contributor

@otoolep otoolep commented Sep 3, 2015

This change adds support for diagnostics by decomposing the existing interface into two interfaces -- one for stats, and the other for diags. It also adds some basic monitor of system, network, and the Go runtime. The format of diagnostics is deliberately loose, as each module will have different data to report.

This change also adds diagnostics support for Graphite. It shows all current TCP connections, as well as the connect time of each.

Example output:

> show diagnostics
> show diagnostics
name: graphite                                                                                                                                         
--------------
local           remote          connect time
127.0.0.1:2003  127.0.0.1:46970 2015-09-03T20:50:19.204562458Z
127.0.0.1:2003  127.0.0.1:46969 2015-09-03T20:50:16.052529583Z

name: runtime
-------------
GOARCH  GOMAXPROCS      GOOS    version
amd64   8               linux   go1.5

name: network
-------------
hostname
malthus

name: system
------------
PID     currentTime                     started                         uptime
10158   2015-09-03T19:34:34.234076883Z  2015-09-03T19:34:24.194068478Z  10.04000847s

Lots more to come as individual packages are instrumented.

@otoolep otoolep force-pushed the hook_up_diagnostics branch 2 times, most recently from 9429ce5 to 88e69b7 Compare September 3, 2015 04:38
@otoolep otoolep changed the title Initial port of diagnostics Implement diagnostics Sep 3, 2015
@otoolep otoolep force-pushed the hook_up_diagnostics branch 2 times, most recently from 613ad93 to 0303154 Compare September 3, 2015 20:44
@otoolep otoolep changed the title Implement diagnostics Complete diagnostics support Sep 3, 2015
@otoolep otoolep force-pushed the hook_up_diagnostics branch from 0303154 to 43a3be9 Compare September 3, 2015 21:13
@otoolep
Copy link
Contributor Author

otoolep commented Sep 3, 2015

Example of retrieving Go heap in use from internal:

> use _internal
Using database _internal
> show measurements
name: measurements
------------------
name
runtime

> select HeapInUse from runtime
name: runtime
-------------
time                            HeapInUse
2015-09-03T21:23:46.794906165Z  1638400
2015-09-03T21:23:47.794445731Z  2039808
2015-09-03T21:23:48.794614339Z  2039808
2015-09-03T21:23:49.794559935Z  2039808
2015-09-03T21:23:50.794456233Z  2048000
2015-09-03T21:23:51.794455478Z  2048000
2015-09-03T21:23:52.794612781Z  2048000
2015-09-03T21:23:53.794433553Z  2072576
2015-09-03T21:23:54.794549499Z  2072576
2015-09-03T21:23:55.794530447Z  3522560
2015-09-03T21:23:56.794556335Z  3538944
2015-09-03T21:23:57.794545723Z  3538944
2015-09-03T21:23:58.794404288Z  3538944
2015-09-03T21:23:59.794596641Z  3547136
2015-09-03T21:24:00.794338119Z  3555328
2015-09-03T21:24:01.794424863Z  3555328
2015-09-03T21:24:02.794533332Z  3563520
2015-09-03T21:24:03.794344875Z  1548288
2015-09-03T21:24:04.794204585Z  1548288
2015-09-03T21:24:05.794593418Z  1548288
2015-09-03T21:24:06.794439389Z  1548288
2015-09-03T21:24:07.794564181Z  1548288
2015-09-03T21:24:08.794277957Z  3162112
2015-09-03T21:24:09.794512651Z  3170304
2015-09-03T21:24:10.79431038Z   3194880
2015-09-03T21:24:11.79433276Z   3194880
2015-09-03T21:24:12.794453011Z  3194880
2015-09-03T21:24:13.794541538Z  3194880
2015-09-03T21:24:14.794436347Z  3194880
2015-09-03T21:24:15.794437984Z  3194880
2015-09-03T21:24:16.794179964Z  3211264
2015-09-03T21:24:17.794435578Z  3211264
2015-09-03T21:24:18.794470792Z  3211264
2015-09-03T21:24:19.794157959Z  3227648
2015-09-03T21:24:20.794150573Z  3235840
2015-09-03T21:24:21.794323955Z  3244032
2015-09-03T21:24:22.794442512Z  2990080
2015-09-03T21:24:23.794347625Z  2990080
2015-09-03T21:24:24.794556997Z  2990080
2015-09-03T21:24:25.794481344Z  2990080
2015-09-03T21:24:26.794443649Z  2990080
2015-09-03T21:24:27.794253306Z  2990080
2015-09-03T21:24:28.794355405Z  2990080
2015-09-03T21:24:29.794538682Z  2990080
2015-09-03T21:24:30.794192797Z  2990080
2015-09-03T21:24:31.794310403Z  2990080

@@ -223,8 +265,10 @@ func (s *Service) openTCPServer() (net.Addr, error) {
// handleTCPConnection services an individual TCP connection for the Graphite input.
func (s *Service) handleTCPConnection(conn net.Conn) {
defer conn.Close()
defer removeConnection(conn)
defer s.wg.Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this wasn't part of this PR but should s.wg.Done() be moved up so it's the first defer since defered funcs execute in LIFO order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

On Thursday, September 3, 2015, dgnorton notifications@github.com wrote:

In services/graphite/service.go
#3969 (comment):

@@ -223,8 +265,10 @@ func (s *Service) openTCPServer() (net.Addr, error) {
// handleTCPConnection services an individual TCP connection for the Graphite input.
func (s *Service) handleTCPConnection(conn net.Conn) {
defer conn.Close()

  • defer removeConnection(conn)
    defer s.wg.Done()

I know this wasn't part of this PR but should s.wg.Done() be moved up so
it's the first defer since defered funcs execute in LIFO order?


Reply to this email directly or view it on GitHub
https://github.com/influxdb/influxdb/pull/3969/files#r38712841.

@dgnorton
Copy link
Contributor

dgnorton commented Sep 4, 2015

I haven't used expvar. They all get exposed via HTTP also?

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

Not by default in our system. I needed to hook it up in the httpd
package, as shown in the patch.

On Thu, Sep 3, 2015 at 5:50 PM, dgnorton notifications@github.com wrote:

I haven't used expvar. They all get exposed via HTTP also?


Reply to this email directly or view it on GitHub
#3969 (comment).

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

Actually expvar was hooked up already, in the previous patch.

https://github.com/influxdb/influxdb/blob/master/services/httpd/handler.go#L166

@dgnorton
Copy link
Contributor

dgnorton commented Sep 4, 2015

Why are the stats vars in the graphite service global and not part of type Service?

@@ -64,8 +107,12 @@ func New(c Config) *Monitor {
func (m *Monitor) Open() error {
m.Logger.Printf("Starting monitor system")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a nit, and I'm not sure how to get around this yet, for testing, you can't silence this as it will "Open" from the server before you can call "SetLogger" on it to null it out. I get a lot of chatter in my test output because of it. I would like to find a better way to do this.

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

Good question.

What would you tag the metrics for an individual service with, such that it
is registered differently from a second Graphite service? In other words,
what would you pass to this call?

RegisterStatsClient(name string, tags map[string]string, client StatsClient)

On Thu, Sep 3, 2015 at 6:11 PM, dgnorton notifications@github.com wrote:

Why are the stats vars in the graphite service global and not part of type
Service?


Reply to this email directly or view it on GitHub
#3969 (comment).

@corylanou
Copy link
Contributor

Quite a bit in here, but nothing jumped out at me. Super excited to start using this. +1

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

I could use a simple index, but that will need to be global for the
package. Any other ideas? The local port perhaps, now that I think about it
-- that would work.

On Thu, Sep 3, 2015 at 6:13 PM, Philip O'Toole philip@influxdb.com wrote:

Good question.

What would you tag the metrics for an individual service with, such that
it is registered differently from a second Graphite service? In other
words, what would you pass to this call?

RegisterStatsClient(name string, tags map[string]string, client
StatsClient)

On Thu, Sep 3, 2015 at 6:11 PM, dgnorton notifications@github.com wrote:

Why are the stats vars in the graphite service global and not part of type
Service?


Reply to this email directly or view it on GitHub
#3969 (comment).

@dgnorton
Copy link
Contributor

dgnorton commented Sep 4, 2015

Yeah, local port seems logical. Or, maybe addr and port.

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

Makes good sense @dgnorton -- I will make that change. I think when I started the code looked a bit different, but made changes later on to make tagging easy...and then forgot to use the port tag here.

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

@dgnorton -- as much as possible pushed onto the service. expvar stats are now per bound service, and the stats are suitably tagged with new bind tag:

> show stats

name: graphite                                                                                                                                                           
tags: bind=:2003, clusterID=5114051809403761185, hostname=marx, nodeID=1, proto=tcp                                                                                      
batches_tx      bytes_rx        connections_active      connections_handled     points_rx       points_tx                                                                
----------      --------        ------------------      -------------------     ---------       ---------                                                                
162             4122548         0                       1                       162833          162000                                                                   

@otoolep otoolep force-pushed the hook_up_diagnostics branch from 3f64b2d to 702a011 Compare September 4, 2015 03:13
This change adds support for diagnostics by decomposing the existing
interface into two interfaces -- one for stats, and the other for
diags. It also adds some basic monitor of system, network, and the Go
runtime.
Graphite diagnostics currently show TCP connections.
@otoolep otoolep force-pushed the hook_up_diagnostics branch from 702a011 to 6ad35e2 Compare September 4, 2015 03:51
@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

Green build, let me know if you think this is good @dgnorton

}

var tcpConnectionsMu sync.Mutex
var tcpConnections map[string]*tcpConnectionDiag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not track connections in the service also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because I went for a simple diagnostics display, something like you
get when you run netstat - a single table showing all connections to
Graphite, not broken down by individual Graphite inputs. It's clear anyway
because the table shows source and dest IP.

I did not break stats down by connection because someone that uses a lot of
different connections that come and might start creating a lot of stats
noise - and series.

On Friday, September 4, 2015, dgnorton notifications@github.com wrote:

In services/graphite/service.go
#3969 (comment):

-// Build the graphite expvar hierarchy.
-func init() {

  • statMap.Set("tcp", statMapTCP)
  • statMap.Set("udp", statMapUDP)
    +type tcpConnectionDiag struct {
  • conn net.Conn
  • connectTime time.Time
    +}

+var tcpConnectionsMu sync.Mutex
+var tcpConnections map[string]*tcpConnectionDiag

Why not track connections in the service also?


Reply to this email directly or view it on GitHub
https://github.com/influxdb/influxdb/pull/3969/files#r38747327.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connections that come and go that is, from different IPs. If this turns out
not to be an issue in practice we can change it.

On Friday, September 4, 2015, Philip O'Toole philip@influxdb.com wrote:

Mostly because I went for a simple diagnostics display, something like you
get when you run netstat - a single table showing all connections to
Graphite, not broken down by individual Graphite inputs. It's clear anyway
because the table shows source and dest IP.

I did not break stats down by connection because someone that uses a lot
of different connections that come and might start creating a lot of stats
noise - and series.

On Friday, September 4, 2015, dgnorton <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

In services/graphite/service.go
#3969 (comment):

-// Build the graphite expvar hierarchy.
-func init() {

  • statMap.Set("tcp", statMapTCP)
  • statMap.Set("udp", statMapUDP)
    +type tcpConnectionDiag struct {
  • conn net.Conn
  • connectTime time.Time
    +}

+var tcpConnectionsMu sync.Mutex
+var tcpConnections map[string]*tcpConnectionDiag

Why not track connections in the service also?


Reply to this email directly or view it on GitHub
https://github.com/influxdb/influxdb/pull/3969/files#r38747327.

@dgnorton
Copy link
Contributor

dgnorton commented Sep 4, 2015

+1

@otoolep
Copy link
Contributor Author

otoolep commented Sep 4, 2015

Thanks for the reviews @corylanou and @dgnorton

otoolep added a commit that referenced this pull request Sep 4, 2015
@otoolep otoolep merged commit 02e2ed8 into master Sep 4, 2015
@otoolep otoolep deleted the hook_up_diagnostics branch September 4, 2015 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants