Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 support #121

Merged
merged 5 commits into from
Mar 31, 2022
Merged

v3 support #121

merged 5 commits into from
Mar 31, 2022

Conversation

MegaByte875
Copy link
Contributor

@MegaByte875 MegaByte875 commented Mar 28, 2022

  • upgrade nebula client

  • update apis/template configuration

@wey-gu
Copy link
Contributor

wey-gu commented Mar 29, 2022

Dear @porscheme
Here is the v3 support of the operator, if you are interested to try it ;)

Thanks.
BR//Wey

Comment on lines 193 to 196
err := metaClient.Disconnect()
if err != nil {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log err or just _ = metaClient.Disconnect() ?

}

func (m *metaClient) Balance(req *meta.BalanceReq) (*meta.BalanceResp, error) {
return m.client.Balance(req)
func (m *metaClient) Balance(req *meta.AdminJobReq) (*meta.AdminJobResp, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this func because we don't recommend use it, user can use BalanceData or BalanceLeader instead

Cmd: meta.AdminCmd_LEADER_BALANCE,
Paras: [][]byte{space},
}
resp, err := m.client.RunAdminJob(req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use m.balance(req) ?

log.Info("balance plan running now")
return m.BalanceStatus(resp.Id)
log.Info("balance job running now")
return m.BalanceStatus(*resp.GetResult_().JobID, space)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems balanceStatus is a long task, do we need use async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this's not a ideal implementation. I will save the last job in status field.

}
resp, err := m.client.RunAdminJob(req)
if err != nil {
return false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if meta done or leader changed? balanceStatus will never stop?

Comment on lines 39 to 48
return wait.PollImmediateUntil(interval, func() (bool, error) {
done, err := fn(ctx)
if err != nil {
if done {
return false, err
}
} else if done {
return true, nil
}
return false, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

done, err := fn(ctx)
if err != nil {
  return false, err
}
return done, nil

Workload WorkloadStatus `json:"workload,omitempty"`
Version string `json:"version,omitempty"`
Phase ComponentPhase `json:"phase,omitempty"`
HostsAdded bool `json:"hostsAdded,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HostsAdded only for Storage?

oldReplicas := extender.GetReplicas(oldWorkload)
newReplicas := extender.GetReplicas(newWorkload)
if !nc.Status.Storaged.HostsAdded || *newReplicas > *oldReplicas {
if err := c.addStorageHosts(nc, *oldReplicas, *newReplicas); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not error when the same host add more than once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here only executed once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

  1. addStorageHosts successfully.
  2. syncComponentStatus failed.
  3. reconcile, and call addStorageHosts again.

if err := c.addStorageHosts(nc, *oldReplicas, *newReplicas); err != nil {
_, ok := err.(*net.DNSError)
if ok {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's sync successfully when DNSError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here capture DNSError will not block subsequent reconcile logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's the DNSError a normal behavior?
If all the following succeed, then add host succeeded?

Comment on lines 95 to 96
empty := len(spaces) == 0
if !empty {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if len(spaces) > 0 ?

if len(hosts) > 0 {
if err := metaClient.RemoveHost(hosts); err != nil {
return err
if !empty {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if len(spaces) > 0 ?

@@ -26,7 +26,8 @@ import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/client"

ng "github.com/vesoft-inc/nebula-go/v2/nebula"
ng "github.com/vesoft-inc/nebula-go/v3/nebula"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about nebulago to unify the alias.

Comment on lines 35 to 38
var stop <-chan struct{}
if ctx != nil {
stop = ctx.Done()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it block whenstop is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done may return nil if this context never be canceled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How use ctx.Done() directly, ctx generally cannot be nil.

@@ -39,7 +39,7 @@ import (
"k8s.io/utils/pointer"
"sigs.k8s.io/controller-runtime/pkg/client"

nebula "github.com/vesoft-inc/nebula-go/v2"
nebula "github.com/vesoft-inc/nebula-go/v3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about nebulago to unify the alias.

Comment on lines 193 to 196
err := metaClient.Disconnect()
if err != nil {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ = metaClient.Disconnect() ?

@veezhang
Copy link
Contributor

How do you think about it when it's needed multiple versions at the same kubernetes?

@porscheme
Copy link

@MegaByte875 @wey-gu how I test this?

@wey-gu
Copy link
Contributor

wey-gu commented Mar 30, 2022

@MegaByte875 could you help @porscheme with a step by step guide?
I think to clone your branch should be ok but guides to help from the build to deploy may help a lot to him/me.
Thanks!

@MegaByte875
Copy link
Contributor Author

I will update helm charts later, you can deploy nebula v3 by helm @porscheme

Copy link
Contributor

@veezhang veezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@veezhang veezhang merged commit facd74e into vesoft-inc:master Mar 31, 2022
@porscheme
Copy link

operator scheduler is crashing, see that attached
Capture
---my helm values file---

image:
  nebulaOperator:
    image: vesoft/nebula-operator:v1.0.0
    imagePullPolicy: Always
  kubeRBACProxy:
    image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
    imagePullPolicy: Always
  kubeScheduler:
    image: k8s.gcr.io/kube-scheduler:v1.18.8
    imagePullPolicy: Always

imagePullSecrets: []
kubernetesClusterDomain: ""

controllerManager:
  create: true
  replicas: 2
  env: []
  resources:
    limits:
      cpu: 200m
      memory: 200Mi
    requests:
      cpu: 100m
      memory: 100Mi

admissionWebhook:
  create: true

scheduler:
  create: true
  schedulerName: nebula-scheduler
  replicas: 1
  env: []
  resources:
    limits:
      cpu: 200m
      memory: 200Mi
    requests:
      cpu: 100m
      memory: 100Mi

# Enable openkruise scheme for controller manager. (default false)
enableKruise: false

# Period at which the controller forces the repopulation of its local object stores. (default 1h0m0s)
syncPeriod:

# Namespace the controller watches for updates to Kubernetes objects, If empty, all namespaces are watched.
watchNamespace:

# The address the metric endpoint binds to. (default ":8080")
metricsBindAddr: ":8080"

# The address the probe endpoint binds to. (default: ":8081")
healthProbeBindAddr: ":8081"

# The TCP port the Webhook server binds to. (default 9443)
webhookBindPort: 9443

# Maximum number of concurrently running reconcile loops for nebula cluster (default 3)
maxConcurrentReconciles: 3

nodeSelector:
  agentpool: metad

tolerations: []

affinity: {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants