pkg/*: address control plane implementation gaps #4074

shashankram · 2021-09-02T22:44:37Z

Description:
This change refactors the core control plane components
to address several gaps in the existing implementation.

At a high level, it resolves the following:

Traffic to a service with multiple ports is filtered and
routed correctly based on the port the traffic is directed
to (Local cluster endpoints for service with multiple ports are not correctly programmed #3777). This was previously a blocker for users with
multiple ports/protocols per k8s service object.
Builds mesh traffic policies for LDS/CDS/RDS configs
inline consistently. This simplifies the code and makes
it a lot easier to test config generation based on the
state of the mesh (service, service accounts, SMI policies
etc.). It also eliminates outbound policy merges that
are unnecessary. Building config for different envoy configs
inline will help when OSM moves to an event-driven config
generation per proxy/group model, where an event can produce
configuration using a nearly consistent view of the caches.
Allows the TargetPort for a service to be different than it's
Port value. This is common in k8s to allow app authors to
change the backend port without needing to update their client
code.
Similar to egress, mesh RDS configs are isolated per port to
avoid conflicts when the same host needs to be routed differently
per port. This can happen when the same service exposes multiple
HTTP ports that need to be routed differently.
Correctly generates hostnames for a service-port combination that
clients can use to access the service.
Addresses bugs in endpoint discovery when the service has multiple
ports. Correctly translates a cluster name to a MeshService to be
able to retrieve its endpoints.
Implements SMI TrafficSplit for permissive mode for both HTTP and
TCP traffic.
Removes and simplifies code that is not required, such as the APIs
to retrieve port:protocol mappings for a service, and other redundant
APIs.
E2e tests and unit tests for various scenarios: multiple ports per
service, permissive mode traffic split, different TargetPort for
service Port etc.

To understand this change better, it's important to note the following
fundamental changes made to address the existing implementation issues
noted above.

A MeshService now carries information on the port, target port and
protocol associated with the service. The implementation ensures
the target port is always set for a service with an endpoint. It is
necessary to disambiguate between the two because k8s allows routing
traffic to a service whose Port and TargetPort are different. This
is of significant importance when it comes to filtering and routing
traffic on the client and server side. A k8s service with multiple
ports is transposed to multiple MeshService representations within
the OSM control plane. A MeshService is the source of truth while
creating filtering and routing configs in Envoy as opposed to a k8s
service. In other words, filter chains, clusters per service have
a 1-1 mapping to a MeshService instance.
RDS mesh inbound and outbound route configurations are built per
port, similar to egress route configurations. This ensures there
can be no conflicts when a service/FQDN needs to be routed differently
based on the port.
Endpoints for a MeshService is filtered by its TargetPort when set. This
is required to ensure endpoints not associated with the MeshService are not
programmed for upstream clusters. This is required to support multiple
ports per service.

Part of #4052
Resolves #3777
Resolves #2527

Testing done:

Existing unit tests and e2e tests
New unit tests and e2e tests
Verified ingress demo with Contour

Affected area:

Functional Area
Control Plane	[X]

Please answer the following questions with yes/no.

Does this change contain code from or inspired by another project? no
- Did you notify the maintainers and provide attribution?
Is this a breaking change? no

This changes adds a test to traffic to a service with multiple ports. Additionally, it configures a TargetPort on the service that is different from the Port value exposed to clients to verify OSM currently maps a port to it's target port while filtering and routing traffic. Co-authored-by: Shashank Ram <shashr2204@gmail.com> Signed-off-by: Shashank Ram <shashr2204@gmail.com>

codecov-commenter · 2021-09-02T22:58:01Z

Codecov Report

Merging #4074 (64862cc) into main (2cec5d7) will increase coverage by 0.38%.
The diff coverage is 86.07%.

@@            Coverage Diff             @@
##             main    #4074      +/-   ##
==========================================
+ Coverage   68.44%   68.83%   +0.38%     
==========================================
  Files         206      205       -1     
  Lines       11673    11317     -356     
==========================================
- Hits         7990     7790     -200     
+ Misses       3633     3475     -158     
- Partials       50       52       +2

Flag	Coverage Δ
unittests	`68.83% <86.07%> (+0.38%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/catalog/mock_catalog_generated.go	`0.00% <0.00%> (ø)`
pkg/envoy/lds/listener.go	`89.16% <ø> (ø)`
pkg/envoy/types.go	`33.33% <ø> (ø)`
pkg/envoy/xdsutil.go	`88.46% <ø> (-0.22%)`	⬇️
pkg/k8s/mock_controller_generated.go	`0.00% <0.00%> (ø)`
pkg/k8s/types.go	`100.00% <ø> (ø)`
pkg/providers/kube/fake.go	`0.00% <ø> (ø)`
pkg/service/mock_service_provider_generated.go	`0.00% <ø> (ø)`
pkg/envoy/lds/inmesh.go	`79.59% <64.28%> (+0.99%)`	⬆️
pkg/envoy/eds/response.go	`49.20% <76.92%> (+6.34%)`	⬆️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cec5d7...64862cc. Read the comment docs.

draychev · 2021-09-03T19:18:27Z

pkg/catalog/types.go

+	// GetOutboundMeshTrafficPolicy returns the outbound mesh traffic policy for the given downstream identity
+	GetOutboundMeshTrafficPolicy(identity.ServiceIdentity) *trafficpolicy.OutboundMeshTrafficPolicy
+
+	// GetInboundMeshTrafficPolicy returns the inbound mesh traffic policy for the given upstream identity and services
+	GetInboundMeshTrafficPolicy(identity.ServiceIdentity, []service.MeshService) *trafficpolicy.InboundMeshTrafficPolicy


This is fantastic!

draychev · 2021-09-03T19:19:59Z

pkg/envoy/eds/cluster_load_assignment_test.go

@@ -30,22 +30,22 @@ func TestNewClusterLoadAssignment(t *testing.T) {

 	cla := newClusterLoadAssignment(namespacedServices[0], allServiceEndpoints[namespacedServices[0]])
 	assert.NotNil(cla)
-	assert.Equal(cla.ClusterName, "osm/bookstore-1")
+	assert.Equal(cla.ClusterName, "ns1/bookstore-1|80")


So glad to see this!

draychev

🚢 it!

pkg/envoy/eds/response.go

pkg/envoy/lds/response_test.go

draychev · 2021-09-03T19:24:32Z

pkg/k8s/types.go


 	// GetEndpoints returns the endpoints for a given service, if found
-	GetEndpoints(svc service.MeshService) (*corev1.Endpoints, error)
+	GetEndpoints(service.MeshService) (*corev1.Endpoints, error)


Thanks for the cleanup!

snehachhabria · 2021-09-03T23:03:14Z

pkg/catalog/inbound_traffic_policies.go

+	permissiveMode := mc.configurator.IsPermissiveTrafficPolicyMode()
+	if !permissiveMode {
+		// Pre-computing the list of TrafficTarget optimizes to avoid repeated
+		// cache lookups for each upstream service.


pkg/catalog/outbound_traffic_policies.go

snehachhabria · 2021-09-03T23:21:09Z

pkg/envoy/cds/cluster.go

-		remoteCluster.LbPolicy = xds_cluster.Cluster_ROUND_ROBIN
-	}
+	// Configure service discovery based on traffic policies
+	remoteCluster.ClusterDiscoveryType = &xds_cluster.Cluster_Type{Type: xds_cluster.Cluster_EDS}


pkg/envoy/rds/response.go

snehachhabria · 2021-09-03T23:30:45Z

pkg/envoy/xdsutil.go

-func GetLocalClusterNameForService(proxyService service.MeshService) string {
-	return GetLocalClusterNameForServiceCluster(proxyService.String())
-}
-
-// GetLocalClusterNameForServiceCluster returns the name of the local cluster for the given service cluster.
-// The local cluster refers to the cluster corresponding to the service the proxy is fronting, accessible over localhost by the proxy.
-func GetLocalClusterNameForServiceCluster(clusterName string) string {
-	return fmt.Sprintf("%s%s", clusterName, localClusterSuffix)
-}
-


thanks for cleaning up

snehachhabria · 2021-09-03T23:33:24Z

pkg/service/types.go

+
+	// Port is the port number that clients use to access the service.
+	// This can be different than MeshService.TargetPort which represents the actual port number
+	// the application is accepting connections on.
+	// Port maps to ServicePort.Port in k8s: https://pkg.go.dev/k8s.io/api/core/v1#ServicePort
+	Port uint16
+
+	// TargetPort is the port number on which an application accept traffic directed to this MeshService
+	// This can be different than MeshService.Port in k8s.
+	// TargetPort maps to ServicePort.TargetPort in k8s: https://pkg.go.dev/k8s.io/api/core/v1#ServicePort
+	TargetPort uint16
+
+	// Protocol is the protocol served by the service's port
+	Protocol string


This is great, it prevents all the back and forth code of converting MeshService to k8s service in the code

This change refactors the core control plane components to address several gaps in the existing implementation. At a high level, it resolves the following: 1. Traffic to a service with multiple ports is filtered and routed correctly based on the port the traffic is directed to (openservicemesh#3777). This was previously a blocker for users with multiple ports/protocols per k8s service object. 2. Builds mesh traffic policies for LDS/CDS/RDS configs inline consistently. This simplifies the code and makes it a lot easier to test config generation based on the state of the mesh (service, service accounts, SMI policies etc.). It also eliminates outbound policy merges that are unnecessary. Building config for different envoy configs inline will help when OSM moves to an event-driven config generation per proxy/group model, where an event can produce configuration using a nearly consistent view of the caches. 3. Allows the TargetPort for a service to be different than it's Port value. This is common in k8s to allow app authors to change the backend port without needing to update their client code. 4. Similar to egress, mesh RDS configs are isolated per port to avoid conflicts when the same host needs to be routed differently per port. This can happen when the same service exposes multiple HTTP ports that need to be routed differently. 5. Correctly generates hostnames for a service-port combination that clients can use to access the service. 7. Addresses bugs in endpoint discovery when the service has multiple ports. Correctly translates a cluster name to a MeshService to be able to retrieve its endpoints. 8. Implements SMI TrafficSplit for permissive mode for both HTTP and TCP traffic. 9. Removes and simplifies code that are not required, such as the APIs to retrieve port:protocol mappings for a service, and other redundant APIs. 10. E2e tests and unit tests for various scenarios: multiple ports per service, permissive mode traffic split, different TargetPort for service Port etc. To understand this change better, it's important to note the following fundamental changes made to address the existing implementation issues noted above. - A MeshService now carries information on the port, target port and protocol associated with the service. The implementation ensures the target port is always set for a service with an endpoint. It is necessary to disambiguate between the two because k8s allows routing traffic to a service whose Port and TargetPort are different. This is of significant importance when it comes to filtering and routing traffic on the client and server side. A k8s service with multiple ports is transposed to multiple MeshService representations within the OSM control plane. A MeshService is the source of truth while creating filtering and routing configs in Envoy as opposed to a k8s service. In other words, filter chains, clusters per service have a 1-1 mapping to a MeshService instance. - RDS mesh inbound and outbound route configurations are built per port, similar to egress route configurations. This ensures there can be no conflicts when a service/FQDN needs to be routed differently based on the port. - Endpoints for a MeshService is filtered by its TargetPort is set. This is required to endpoints not associated with the MeshService are not programmed for upstream clusters. This is required to support multiple ports per service. Part of openservicemesh#4052 Resolves openservicemesh#3777 Resolves openservicemesh#2527 Signed-off-by: Shashank Ram <shashr2204@gmail.com>

shashankram added the wip Work-in-Progress label Sep 2, 2021

shashankram force-pushed the cluster-port branch from 71b5f96 to f81e8be Compare September 2, 2021 22:47

shashankram force-pushed the cluster-port branch 3 times, most recently from de37cad to 16356ba Compare September 3, 2021 17:23

shashankram changed the title ~~[do not review]: testing~~ pkg/*: address control plane implementation gaps Sep 3, 2021

shashankram removed the wip Work-in-Progress label Sep 3, 2021

shashankram marked this pull request as ready for review September 3, 2021 17:53

shashankram requested a review from a team as a code owner September 3, 2021 17:53

shashankram added the release/v1 Release v1.0.0 label Sep 3, 2021

draychev reviewed Sep 3, 2021

View reviewed changes

draychev approved these changes Sep 3, 2021

View reviewed changes

shashankram force-pushed the cluster-port branch 3 times, most recently from 0e528c2 to d05f80d Compare September 3, 2021 19:32

snehachhabria approved these changes Sep 3, 2021

View reviewed changes

shashankram force-pushed the cluster-port branch from d05f80d to cd602db Compare September 7, 2021 16:53

shashankram merged commit 66f7ef2 into openservicemesh:main Sep 7, 2021

This was referenced Sep 7, 2021

Refactor routes, clusters, and listener configurations to address implementation gaps and simplify #4052

Closed

feat(*): add traffic split in permissive mode #4023

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/*: address control plane implementation gaps #4074

pkg/*: address control plane implementation gaps #4074

shashankram commented Sep 2, 2021 •

edited

Loading

codecov-commenter commented Sep 2, 2021 •

edited

Loading

draychev Sep 3, 2021

draychev Sep 3, 2021

draychev left a comment

draychev Sep 3, 2021

snehachhabria Sep 3, 2021

snehachhabria Sep 3, 2021

snehachhabria Sep 3, 2021

snehachhabria Sep 3, 2021

pkg/*: address control plane implementation gaps #4074

pkg/*: address control plane implementation gaps #4074

Conversation

shashankram commented Sep 2, 2021 • edited Loading

codecov-commenter commented Sep 2, 2021 • edited Loading

Codecov Report

draychev Sep 3, 2021

Choose a reason for hiding this comment

draychev Sep 3, 2021

Choose a reason for hiding this comment

draychev left a comment

Choose a reason for hiding this comment

draychev Sep 3, 2021

Choose a reason for hiding this comment

snehachhabria Sep 3, 2021

Choose a reason for hiding this comment

snehachhabria Sep 3, 2021

Choose a reason for hiding this comment

snehachhabria Sep 3, 2021

Choose a reason for hiding this comment

snehachhabria Sep 3, 2021

Choose a reason for hiding this comment

shashankram commented Sep 2, 2021 •

edited

Loading

codecov-commenter commented Sep 2, 2021 •

edited

Loading