Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For sharing review comments #1

Open
wants to merge 314 commits into
base: for-review
Choose a base branch
from
Open

For sharing review comments #1

wants to merge 314 commits into from

Conversation

tnqn
Copy link
Owner

@tnqn tnqn commented Aug 14, 2023

No description provided.

heypnus and others added 30 commits January 26, 2022 16:21
Support CIDR in group for security policy
Change thumbprint from sha256 to sha1
Fix nsx requests not using token while token exists
1.Support MatchExpression Operator: 'NotIn', 'Exists',
'DoesNotExist' for VM/Pod/Namespace label selectors
in SecurityPolicy

2.Given NSX doesn't support MatchExpression Operator 'In',
and only five criteria are allowed currently,
which results into a gigantic group expression body to be passed to NSX.
So, only allow just one Operator 'In' MatchExpressions with at most
of five values in it.

3.Add NSX-T limitation and NSGroup Criteria check
…Support

Add MatchExpression support for SecurityPolicy
Update some log output
Remove unused test case from transport_test.go
Roundtrip should only return err from base().RoundTrip
This patch will fix 2 issues:

1. start of portRange was mistakenly set into the sourcePorts property
   in NSX-T.
2. json.Marshal(rule) cannot detect the change of SecurityPolicyPort.
Because that some existing container may already use this port.
Add priority range in SecurityPolicy definition
Remove application/x-www-form-urlencoded from header
1. When building the expression, if there's only "ncp/pod" in the
   "value" property, the "ncp/pod" will appear in place of the
   "tag equals", not of the "scope equals" which we are expecting.
   This patch will change "ncp/pod" to "ncp/pod|" to fix this issue.
2. If namespaceSelector not set in the peer, the rule should select
   the pod/vm in the same namespace instead of all namespaces.
Change the probe port from 8383 to 8384
…ondition_kv

Fix the condition issues for pod/vm selector
Auto decrement page size when 60576 errcode appears
This patch is to refactor PR#55, certieria produced
for PodSelector, VMSelector and NamespaceSelector,
and make the certieria always to be created as
a mixed one by changing cluster tag membery type.
For PodSelector, VMSelector:
cluster tag membery type is set as Segment
For NamespaceSelector:
cluster tag membery type is set as SegmentPort
This patch is to add MatchExpression UnitTests, fix some typo
and a minor code refactor in updateMixedExpressionsMatchExpression
Enforce user cannot create SecPolicy CR in sys ns
Use gomonkey to mock struct method
Use gomock to mock interface method
Update gc function so it could be tested
Api manager in config may include port and scheme
when searching local thumbprint, it should move the port part
and only compare host
heypnus and others added 11 commits August 10, 2023 21:53
…ntegration

Support to create subnetport in subnetset
Get port number by subnet ID from subnetport store
"Project" and "External_ipv4_blocks" are retrieved from
VPCNetworkConfiguration CR instead of NCP ConfigMap.
Remove the validation check for these 2 options.
…ve_config

Remove validation check for default_project and external_ipv4_blocks
1. Read the real StaticIPAllocation from CR during building NSX subnet.
2. Set StaticIPAllocation to true in default subnetset.
3. Correct the finalizer names of subnet/subnetset.
…rties

Fix several subnet/subnetset issues
Fix errors when deleting ip block from ipblock store
Comment on lines +54 to +73
logf.SetLogger(logger.ZapLogger())
cf, err = config.NewNSXOperatorConfigFromFile()
if err != nil {
log.Error(err, "load config file error")
os.Exit(1)
}

if os.Getenv("NSX_OPERATOR_NAMESPACE") != "" {
nsxOperatorNamespace = os.Getenv("NSX_OPERATOR_NAMESPACE")
}

if cf.HAEnabled() {
log.Info("HA mode enabled")
} else {
log.Info("HA mode disabled")
}

if metrics.AreMetricsExposed(cf) {
metrics.InitializePrometheusMetrics()
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not proper to be in init as they will be executed as long as the package is imported, which means you need to prepare a config file even when running unit test for cmd package.

Comment on lines +185 to +187
nsxClient := nsx.GetClient(cf)
if nsxClient == nil {
log.Error(err, "failed to get nsx client")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the func could fail to return a valid nsxClient, it should use nil error to indicate it, instead of returning nil nsxClient. Currently the code is contradictory, the err is always nil as it's not received from the function, and the func never returns nil nsxClient, even it fails (Instead, it just log an error`)

if !nsxClient.NSXCheckVersion(SecurityPolicy) {
err := errors.New("SecurityPolicy feature support check failed")
log.Error(err, "initial NSX version check for SecurityPolicy got error")
}
if !nsxClient.NSXCheckVersion(ServiceAccount) {
err := errors.New("NSXServiceAccount feature support check failed")
log.Error(err, "initial NSX version check for NSXServiceAccount got error")
}
return nsxClient

}

if cf.CoeConfig.EnableVPCNetwork && commonService.NSXClient.NSXCheckVersion(nsx.VPC) {
log.V(1).Info("VPC mode enabled")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such one time log indicating the enablement of a module could be V(0) as it's few but provide key information.

Comment on lines +48 to +51
if subnetService == nil {
lock.Lock()
defer lock.Unlock()
if subnetService == nil {
Copy link
Owner Author

@tnqn tnqn Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the func is supposed to be called concurrently, it's not really thread-safe as L48 is without lock acquired.
If the func is not supposed to be called concurrently, the lock is not required at all.
From the usage, I think the lock and the global variable subnetServer makes it complicated, and it would be hard to use SubnetService in unit test, as you will need to reset the global variable every time after a test writes it.
A simpler and clearer way to initialzie the struct is just expose a constructor to main.go and inject the service to modules which rely on it:

func NewSubnetService(service common.Service) (*SubnetService, error)

...
# main.go
subnetService = subnet.NewSubnetService(commonService)
subnetReconciler.Service = subnetService
...
subnetsetReconciler.Service = subnetService

Comment on lines +64 to +91
wg := sync.WaitGroup{}
wgDone := make(chan bool)
fatalErrors := make(chan error)
subnetService := &SubnetService{
Service: service,
SubnetStore: &SubnetStore{
ResourceStore: common.ResourceStore{
Indexer: cache.NewIndexer(keyFunc, cache.Indexers{
common.TagScopeSubnetCRUID: subnetIndexFunc,
}),
BindingType: model.VpcSubnetBindingType(),
},
},
}

wg.Add(1)
go subnetService.InitializeResourceStore(&wg, fatalErrors, ResourceTypeSubnet, nil, subnetService.SubnetStore)
go func() {
wg.Wait()
close(wgDone)
}()
select {
case <-wgDone:
break
case err := <-fatalErrors:
close(fatalErrors)
return subnetService, err
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the essential of the code is that it wants to get the result of InitializeResourceStore synchronously, regardless of success or failure. The usage of the channels and waitGroup makes it really complicated. Why don't just make InitializeResourceStore return an error?

func NewSubnetService(service common.Service) (*SubnetService, error) {
	subnetService := &SubnetService{
		Service: service,
		SubnetStore: &SubnetStore{
			ResourceStore: common.ResourceStore{
				Indexer: cache.NewIndexer(keyFunc, cache.Indexers{
					common.TagScopeSubnetCRUID: subnetIndexFunc,
				}),
				BindingType: model.VpcSubnetBindingType(),
			},
		},
	}
	if err := subnetService.InitializeResourceStore(ResourceTypeSubnet, nil, subnetService.SubnetStore); err != nil {
		return nil, err
	}
	return subnetService, nil
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, closing a channel could make goroutines writting to it panic, which is the case of InitializeResourceStore.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the essential of the code is that it wants to get the result of InitializeResourceStore synchronously, regardless of success or failure. The usage of the channels and waitGroup makes it really complicated. Why don't just make InitializeResourceStore return an error?

func NewSubnetService(service common.Service) (*SubnetService, error) {
	subnetService := &SubnetService{
		Service: service,
		SubnetStore: &SubnetStore{
			ResourceStore: common.ResourceStore{
				Indexer: cache.NewIndexer(keyFunc, cache.Indexers{
					common.TagScopeSubnetCRUID: subnetIndexFunc,
				}),
				BindingType: model.VpcSubnetBindingType(),
			},
		},
	}
	if err := subnetService.InitializeResourceStore(ResourceTypeSubnet, nil, subnetService.SubnetStore); err != nil {
		return nil, err
	}
	return subnetService, nil
}

For the case of Subnet service, we can just make InitializeResourceStore just return an error. I think InitializeResourceStore is designed to use channels/waitgroup to initialize multiple resource stores simultaneously in different goroutines. Taking firewall service as example, it initializes four kinds of resource stores simultaneously.

https://github.com/vmware-tanzu/nsx-operator/blob/vpc_dev/pkg/nsx/services/securitypolicy/firewall.go#L70

But I see that most of services ONLY initialize one store, so it also makes sense to simplify InitializeResourceStore.

Comment on lines +107 to +110
vpcInfo, err := common.ParseVPCResourcePath(vpcList.Items[0].Status.NSXResourcePath)
if err != nil {
return "", err
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it guarantee NSXResourcePath has been set when processing this subnet? If there is no guarantee, errors may be generated randomly which might scare users who are monitoring component logs.
If having a valid NSXResourcePath is a precondition of creating subnet, perhaps it could skip enqueuing a subnet when VPC is not ready and enqueue associated subnets when VPC is ready.

Comment on lines +63 to +67
if len(obj.Spec.SubnetSet) > 0 && len(obj.Spec.Subnet) > 0 {
err := errors.New("subnet and subnetset should not be configured at the same time")
log.Error(err, "failed to get subnet/subnetset of the subnetport", "subnetport", req.NamespacedName)
return common.ResultNormal, err
}
Copy link
Owner Author

@tnqn tnqn Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could enforce the validation in CRD schema to ensure at most one SubnetSet/Subnet is set. Otherwise returning the error to make it retry is pointless.

// if attachmentRef.Name == "" {
// defaultVMSubnet = true
// }
old_status := obj.Status.DeepCopy()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use consistent naming style: oldStatus

Comment on lines +323 to +327
subnetPath = subnet.Status.NSXResourcePath
if len(subnetPath) == 0 {
err := fmt.Errorf("empty NSX resource path from subnet %s", subnet.Name)
return subnetPath, err
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess there is no guarantee that NSXResourcePath is not empty here. Instead of returning error, perhaps it could also skip enqueueing the subnetPort when its subnet's status is not ready, and watch subnet's status update event and make it enqueue associated subnetPorts which are not realized yet.

Comment on lines +338 to +342
subnetPath, err := r.AllocateSubnetFromSubnetSet(obj, subnetSet)
if err != nil {
return subnetPath, err
}
return subnetPath, nil
Copy link
Owner Author

@tnqn tnqn Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this is related to where @dantingl raised the race condition. Except for adding a lock to the method, there may be another solution:
subnetsetController also watches subsetPorts, and checks if the current capacity is enough by comparing the expected capacity and number of current subsetPorts. In its reconcile func, it auto-scales the subnetSet when the capacity is not enough.
Then subnetPort just waits for available subnet/subnetSet to be available and can unify the implementation.

matchedCondition := getExistingConditionOfType(newCondition.Type, subnetset.Status.Conditions)

if reflect.DeepEqual(matchedCondition, newCondition) {
log.V(2).Info("conditions already match", "New Condition", newCondition, "Existing Condition", matchedCondition)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use space to connect multi words in the key of structured logging, otherwise it could be hard to differentiate key and value in some cases and hard to use tool to extract key and values. Some suggestions about logging:
https://github.com/tnqn/code-review-comments#logging

return ctrl.NewControllerManagedBy(mgr).
For(&v1alpha1.SubnetSet{}).
WithOptions(controller.Options{
MaxConcurrentReconciles: runtime.NumCPU(),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to use cpu number as the concurrency of network-intensive tasks, the efficiecy of the application may vary in different environments and unpredictable. It may run well when it's tested in a env but badly in another.

Comment on lines +220 to +227
portNums := len(common.ServiceMediator.GetPortsOfSubnet(*subnet.Id))
if portNums > 0 {
continue
}
if err := r.Service.DeleteSubnet(subnet); err != nil {
log.Error(err, "fail to delete subnet from subnetset cr", "ID", *subnet.Id)
hitError = true
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could conflict with the code that allocates empty subnet. If #1 (comment) is accepted, perhaps it could implement garbage collection in Reconcile, it could be retriggered by always returning RequeueAfter as the GCInterval

jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 15, 2024
Initialize subnet service once then inject it to multiple reconcilers
instead of initializing service in different reconcilers with a lock
ensuring the initialization is invoked once.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 15, 2024
Add LastTansitionTime in CR status which helps understand when the
issue was encountered/resolved when viewing the CR.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 15, 2024
Add LastTansitionTime in CR status which helps understand when the
issue was encountered/resolved when viewing the CR.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 15, 2024
Add LastTansitionTime in CR status which helps understand when the
issue was encountered/resolved when viewing the CR.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 15, 2024
Add LastTansitionTime in CR status which helps understand when the
issue was encountered/resolved when viewing the CR.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 18, 2024
Initialize subnet service once then inject it to multiple reconcilers
instead of initializing service in different reconcilers with a lock
ensuring the initialization is invoked once.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 18, 2024
Add LastTansitionTime in CR status which helps understand when the
issue was encountered/resolved when viewing the CR.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 25, 2024
Remove Subnet/SubnetPort service from mediator as service mediato
will be removed from nsx-operator. Subnet/SubnetPort service will
only be instantiated once and passed as pointer across controllers.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 25, 2024
Remove Subnet/SubnetPort service from mediator as service mediato
will be removed from nsx-operator. Subnet/SubnetPort service will
only be instantiated once and passed as pointer across controllers.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 25, 2024
Remove Subnet/SubnetPort service from mediator as service mediato
will be removed from nsx-operator. Subnet/SubnetPort service will
only be instantiated once and passed as pointer across controllers.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 30, 2024
Remove Subnet/SubnetPort service from mediator as service mediato
will be removed from nsx-operator. Subnet/SubnetPort service will
only be instantiated once and passed as pointer across controllers.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Jan 30, 2024
Remove Subnet/SubnetPort service from mediator as service mediato
will be removed from nsx-operator. Subnet/SubnetPort service will
only be instantiated once and passed as pointer across controllers.

tnqn#1 (comment)
jwsui added a commit to jwsui/nsx-operator that referenced this pull request Feb 1, 2024
Remove Subnet/SubnetPort service from mediator as service mediato
will be removed from nsx-operator. Subnet/SubnetPort service will
only be instantiated once and passed as pointer across controllers.

tnqn#1 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.