Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement heath?all endpoint on Alpha nodes. #4535

Merged
merged 15 commits into from
Jan 11, 2020
Merged

Implement heath?all endpoint on Alpha nodes. #4535

merged 15 commits into from
Jan 11, 2020

Conversation

parasssh
Copy link
Contributor

@parasssh parasssh commented Jan 9, 2020

This change is Reviewable

@parasssh parasssh requested a review from a team as a code owner January 9, 2020 22:38
glog.Infof("%v is unhealthy. err %v\n", vm.GetAddr(), err)
} else {
if err = json.Unmarshal(p.GetHealthInfo(), &curr); err != nil {
glog.Infof("Unable to Unmarshal. err %v\n", vm.GetAddr(), err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printf: Infof call needs 1 arg but has 2 args (from govet)

glog.Infof("%v is unhealthy. err %v\n", vz.GetAddr(), err)
} else {
if err = json.Unmarshal(p.GetHealthInfo(), &curr); err != nil {
glog.Infof("Unable to Unmarshal. err %v\n", vz.GetAddr(), err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printf: Infof call needs 1 arg but has 2 args (from govet)

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 2 unresolved discussions (waiting on @danielmai, @golangcibot, and @manishrjain)


edgraph/server.go, line 654 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

printf: Infof call needs 1 arg but has 2 args (from govet)

Done.


edgraph/server.go, line 677 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

printf: Infof call needs 1 arg but has 2 args (from govet)

Done.

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @golangcibot from 2 discussions.
Reviewable status: 0 of 7 files reviewed, all discussions resolved (waiting on @danielmai and @manishrjain)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 10 unresolved discussions (waiting on @danielmai, @manishrjain, and @parasssh)


conn/pool.go, line 83 at r2 (raw file):

		return nil, ErrNoConnection
	}
	if !pool.IsHealthy() || pool.healthInfo == nil {

I wouldn't add the healthInfo check. That's more for informational purposes.


conn/pool.go, line 226 at r2 (raw file):

		p.healthInfo = res.Data
		p.Unlock()
	}

There might be a GetAll func, which returns all the Pools or all of their status.


conn/raft_server.go, line 279 at r2 (raw file):

	defer ticker.Stop()

	info := struct {

Instead of doing this JSON thing, how about having a proto for server status. More readable and we can later extend it.


dgraph/cmd/alpha/run.go, line 287 at r2 (raw file):

		ctx := context.Background()
		ctx = attachAccessJwt(ctx, r)

Do we need to care about access jwt?


dgraph/cmd/alpha/run.go, line 298 at r2 (raw file):

			return
		}
		if _, err := w.Write(aResp.Json); err != nil {

If we're unable to write to w, then writing an error to w would also fail. Below, we just write and ignore the result.


edgraph/server.go, line 608 at r2 (raw file):

func (s *Server) doHealthAll(ctx context.Context, authorize int) (
	*api.Response, error) {

can lie in the same line as above.


edgraph/server.go, line 621 at r2 (raw file):

	}

	ms := worker.GetMembershipState()

This might be unnecessary.


edgraph/server.go, line 642 at r2 (raw file):

				Instance: "alpha",
				Version:  "unavailable",
				Uptime:   0,

Might also mention the group they're part of. In fact, that should be part of the health status info that comes from the server.


edgraph/server.go, line 651 at r2 (raw file):

				p, err := conn.GetPools().Get(vm.GetAddr())
				if err != nil {
					glog.Infof("%v is unhealthy. err %v\n", vm.GetAddr(), err)

No need for this.


edgraph/server.go, line 676 at r2 (raw file):

			glog.Infof("%v is unhealthy. err %v\n", vz.GetAddr(), err)
		} else {
			if err = json.Unmarshal(p.GetHealthInfo(), &curr); err != nil {

It should be protos here. And once we have all the protos, we can then unify them into a JSON.

@parasssh parasssh changed the title Ps health all Implement heath?all endpoint on Alpha nodes. Jan 10, 2020
Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 10 unresolved discussions (waiting on @danielmai and @manishrjain)


conn/pool.go, line 83 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I wouldn't add the healthInfo check. That's more for informational purposes.

Done.


conn/pool.go, line 226 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

There might be a GetAll func, which returns all the Pools or all of their status.

Didnt find any such fn.


conn/raft_server.go, line 279 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead of doing this JSON thing, how about having a proto for server status. More readable and we can later extend it.

Done.


dgraph/cmd/alpha/run.go, line 287 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Do we need to care about access jwt?

The JIRA tickets mentioned we need to only allow groot user.


dgraph/cmd/alpha/run.go, line 298 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If we're unable to write to w, then writing an error to w would also fail. Below, we just write and ignore the result.

done.


edgraph/server.go, line 608 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

can lie in the same line as above.

Done.


edgraph/server.go, line 621 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This might be unnecessary.

Need this to loop thru the alpha and zero members which allows us to report healthy and unhealthy ones. Cannot loop thru the pool entries since it only contains "healthy" ones.


edgraph/server.go, line 642 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Might also mention the group they're part of. In fact, that should be part of the health status info that comes from the server.

Done.


edgraph/server.go, line 651 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

No need for this.

Done.


edgraph/server.go, line 676 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

It should be protos here. And once we have all the protos, we can then unify them into a JSON.

Done.

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @manishrjain from 10 discussions.
Reviewable status: 0 of 7 files reviewed, all discussions resolved (waiting on @danielmai)

curr = p.GetHealthInfo()
}
health["healthy"] = append(health["healthy"], curr)
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S1023: redundant return statement (from gosimple)

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @danielmai and @parasssh)


edgraph/access_ee.go, line 788 at r3 (raw file):

// authorizeForGroot authorizes the operation for Groot users.
func authorizeForGroot(ctx context.Context) error {

Shouldn't we allow for the whole Guardian group now? Groot is just a default user, other users in the group should be able to access.

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @golangcibot from a discussion.
Reviewable status: 0 of 10 files reviewed, 1 unresolved discussion (waiting on @danielmai and @mangalaman93)


edgraph/access_ee.go, line 788 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

Shouldn't we allow for the whole Guardian group now? Groot is just a default user, other users in the group should be able to access.

I will check but the jira ticket only mentioned allowing groot.


edgraph/server.go, line 647 at r3 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

S1023: redundant return statement (from gosimple)

Done.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 7 files at r1, 7 of 8 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @danielmai, @mangalaman93, and @parasssh)


conn/node.go, line 61 at r4 (raw file):

	// Fields which are never changed after init.
	BeginTime   time.Time

minor: StartTime sounds slightly better.


conn/pool.go, line 200 at r4 (raw file):

	defer cancel()

	s, err := c.Heartbeat(ctx, &pb.HealthInfo{})

should we change the Heartbeat method to stop taking a second parameter? It doesn't look like we are doing anything with the passed pb.HealthInfo object.


conn/raft_server.go, line 275 at r4 (raw file):

(in *pb.HealthInfo,

Is this parameter being used at all? I don't see it being used anywhere in the method.

If not and you don't want or should change the interface, write the signature as:

func (w *RaftServer) Heartbeat(_ *pb.HealthInfo,...


edgraph/access.go, line 68 at r4 (raw file):

}

func authorizeForGroot(ctx context.Context) error {

minor: authorizeGroot sounds good enough.


edgraph/access_ee.go, line 788 at r4 (raw file):

// authorizeForGroot authorizes the operation for Groot users.
func authorizeForGroot(ctx context.Context) error {

minor: authorizeGroot here as well


edgraph/server.go, line 607 at r4 (raw file):

}

func (s *Server) doHealthAll(ctx context.Context, authorize int) (*api.Response, error) {

Why is this a separate method? It doesn't look like it's being used anywhere else. Will that change?


edgraph/server.go, line 632 at r4 (raw file):

		}

		// get health locally if self.

minor: uppercase and period. Same with the other comments below.


x/config.go, line 78 at r4 (raw file):

	ProposedGroupId uint32
	// BeginTime is the start time of the alpha
	BeginTime time.Time

minor: StartTime sounds better.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 7 files at r1, 7 of 8 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @danielmai, @mangalaman93, and @parasssh)


conn/raft_server.go, line 297 at r4 (raw file):

	for {
		info.Uptime = uint64(time.Since(node.BeginTime))

int64. I think this should be in seconds.


dgraph/cmd/alpha/run.go, line 287 at r4 (raw file):

		ctx := context.Background()
		ctx = attachAccessJwt(ctx, r)

ctx := attach..(context.Background(), r)


dgraph/cmd/alpha/run.go, line 289 at r4 (raw file):

		ctx = attachAccessJwt(ctx, r)

		var aResp *api.Response

resp


edgraph/server.go, line 621 at r2 (raw file):

Previously, parasssh wrote…

Need this to loop thru the alpha and zero members which allows us to report healthy and unhealthy ones. Cannot loop thru the pool entries since it only contains "healthy" ones.

We don't remove any connection, healthy or unhealthy as long as they're in the membership state.


edgraph/server.go, line 603 at r4 (raw file):

// HealthAll handles health?all requests
func (s *Server) HealthAll(ctx context.Context) (*api.Response, error) {

These two can be merged.


edgraph/server.go, line 639 at r4 (raw file):

			curr.Uptime = uint64(time.Since(x.WorkerConfig.BeginTime))
		} else {
			p, err := conn.GetPools().Get(vm.GetAddr())

pools := conn.GetPools().GetAll()

Get their health infos and then add current health info. Put that into a list and just JSON marshal.


edgraph/server.go, line 644 at r4 (raw file):

				return
			}
			curr = p.GetHealthInfo()

p.HealthInfo()


edgraph/server.go, line 662 at r4 (raw file):

	var jsonOut []byte
	if jsonOut, err = json.Marshal(health); err != nil {

json.Marshal(list of health infos)

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @martinmr, and @martinmr from 16 discussions.
Reviewable status: 0 of 11 files reviewed, 2 unresolved discussions (waiting on @danielmai, @manishrjain, and @martinmr)


conn/pool.go, line 200 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

should we change the Heartbeat method to stop taking a second parameter? It doesn't look like we are doing anything with the passed pb.HealthInfo object.

It seems service rpc param is mandatory in protos. I changed it to the generic api.Payload as before.


conn/raft_server.go, line 297 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

int64. I think this should be in seconds.

Done.


edgraph/access_ee.go, line 788 at r3 (raw file):

Previously, parasssh wrote…

I will check but the jira ticket only mentioned allowing groot.

We do for groot only.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Some comments.

Reviewed 11 of 11 files at r5.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @danielmai and @parasssh)


conn/pool.go, line 281 at r5 (raw file):

		p.healthInfo.Status = "unhealthy"
	}
	p.healthInfo.LastEcho = p.lastEcho.Unix()

If this is in seconds, ensure that other timestamps are also in seconds.


edgraph/server.go, line 624 at r5 (raw file):

		Version:  x.Version(),
		Uptime:   int64(time.Since(x.WorkerConfig.StartTime) / time.Second),
		LastEcho: time.Now().UnixNano(),

You used unix above. So, this should be Unix(), not Nano.


worker/groups.go, line 590 at r5 (raw file):

// GetGroupId returns the group to which this worker belongs to.
func GetGroupId() uint32 {

Drop the Get prefix. Just call it GroupId()

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r5.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @danielmai and @parasssh)


conn/pool.go, line 200 at r4 (raw file):

Previously, parasssh wrote…

It seems service rpc param is mandatory in protos. I changed it to the generic api.Payload as before.

Ok

Copy link
Contributor Author

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @manishrjain from 3 discussions.
Reviewable status: 9 of 11 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, and @martinmr)


conn/pool.go, line 281 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If this is in seconds, ensure that other timestamps are also in seconds.

Yes, it is the seconds since epoch.


edgraph/server.go, line 624 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You used unix above. So, this should be Unix(), not Nano.

Done.


worker/groups.go, line 590 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Drop the Get prefix. Just call it GroupId()

done.

@parasssh parasssh merged commit 48c79ff into master Jan 11, 2020
@parasssh parasssh deleted the ps_health_all branch January 30, 2020 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants