Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Local cluster endpoints for service with multiple ports are not correctly programmed #3777

Closed
Tracked by #4052
shashankram opened this issue Jul 16, 2021 · 0 comments · Fixed by #4074
Closed
Tracked by #4052
Assignees
Labels
area/control-plane Related to OSM's control plane kind/bug Something isn't working
Milestone

Comments

@shashankram
Copy link
Member

Bug description:
Local cluster endpoints for a service with multiple ports are not correctly programmed.

Consider the following k8s service :

apiVersion: v1
kind: Service
metadata:
  labels:
    app: bookstore
  name: bookstore
  namespace: bookstore
spec:
  ports:
  - name: ssh
    port: 22 # TCP based 
    protocol: TCP
  - name: web
    port: 80 # HTTP based
    protocol: TCP
  selector:
    app: bookstore
  type: ClusterIP

The code in https://github.com/openservicemesh/osm/blob/main/pkg/envoy/cds/cluster.go#L129-L152 incorrectly programs both the ports as endpoints for the local cluster corresponding to the bookstore service.

This is due to a limitation in the existing implementation where a single cluster is created for a service. This is not correct because a single local cluster cannot proxy traffic meant for different protocol ports to all the ports the same way. The problem with doing so is that in the above example, HTTP traffic will get proxied to a TCP port, and vice versa.

Proposed fix:

  • Create cluster per service port. Cluster names should change to include the port corresponding to the cluster. If there are multiple ports on the service that are exposed, multiple local clusters must be created per pert.
  • Update the inbound HTTP routes to only reference HTTP based clusters specific to HTTP ports
  • Update the inbound TCP route to only reference TCP based clusters specific to TCP ports

Affected area (please mark with X where applicable):

  • Control Plane [X]
  • Networking [X]

Expected behavior:
HTTP traffic should not be proxied to TCP ports, and vice versa.

Steps to reproduce the bug (as precisely as possible):

  1. Create a service with multiple ports (HTTP and TCP)
  2. Send HTTP traffic to the service on it's HTTP port

Since the service has a TCP port, the local cluster endpoints will incorrectly include this port.

@shashankram shashankram added kind/bug Something isn't working area/control-plane Related to OSM's control plane labels Jul 16, 2021
@shashankram shashankram self-assigned this Aug 23, 2021
@shashankram shashankram added this to the v1.0.0 milestone Aug 30, 2021
shashankram added a commit to shashankram/osm that referenced this issue Sep 3, 2021
This change refactors the core control plane components
to address several gaps in the existing implementation.

At a high level, it resolves the following:

1. Traffic to a service with mutiple ports is filtered and
   routed correctly based on the port the traffic is directed
   to (openservicemesh#3777). This was previously a blocker for users with
   multiple ports/protocols per k8s service object.

2. Builds mesh traffic policies for LDS/CDS/RDS configs
   inline consistently. This simplifies the code and makes
   it a lot easier to test config generation based on the
   state of the mesh (service, service accounts, SMI policies
   etc.). It also eliminates outbound policy merges that
   are unnecessary. Building config for different envoy configs
   inline will help when OSM moves to an event-driven config
   generation per proxy/group model, where an event can produce
   configuration using a nearly consistent view of the caches.

3. Allows the TargetPort for a service to be different than it's
   Port value. This is common in k8s to allow app authors to
   change the backend port without needing to update their client
   code.

4. Similar to egress, mesh RDS configs are isolated per port to
   avoid conflicts when the same host needs to be routed differently
   per port. This can happen when the same service exposes multiple
   HTTP ports that need to be routed differently.

5. Correctly generates hostnames for a service-port combination that
   clients can use to access the service.

7. Addresses bugs in endpoint discovery when the service has multiple
   ports. Correctly translates a cluster name to a MeshService to be
   able to retrieve its endpoints.

8. Implements SMI TrafficSplit for permissive mode for both HTTP and
   TCP traffic.

9. Removes and simplifies code that are not required, such as the APIs
   to retrieve port:protocol mappings for a service, and other redundant
   APIs.

10. E2e tests and unit tests for various scenarios: multiple ports per
    service, permissive mode traffic split, different TargetPort for
    service Port etc.

To understand this change better, it's important to note the following
fundamental changes made to address the existing implementation issues
noted above.

- A MeshService now carries information on the port, target port and
  protocol associated with the service. The implementation ensures
  the target port is always set for a service with an endpoint. It is
  necessary to disambiguate between the two because k8s allows routing
  traffic to a service whose Port and TargetPort are different. This
  is of significant importance when it comes to filtering and routing
  traffic on the client and server side. A k8s service with multiple
  ports is transposed to multiple MeshService representations within
  the OSM control plane. A MeshService is the source of truth while
  creating filtering and routing configs in Envoy as opposed to a k8s
  service. In other words, filter chains, clusters per service have
  a 1-1 mapping to a MeshService instance.

- RDS mesh inbound and outbound route configurations are built per
  port, similar to egress route configurations. This ensures there
  can be no conflicts when a service/FQDN needs to be routed differently
  based on the port.

- Endpoints for a MeshService is filtered by its TargetPort is set. This
  is required to endpoints not associated with the MeshService are not
  programmed for upstream clusters. This is required to support multiple
  ports per service.

Signed-off-by: Shashank Ram <shashr2204@gmail.com>
shashankram added a commit to shashankram/osm that referenced this issue Sep 3, 2021
This change refactors the core control plane components
to address several gaps in the existing implementation.

At a high level, it resolves the following:

1. Traffic to a service with multiple ports is filtered and
   routed correctly based on the port the traffic is directed
   to (openservicemesh#3777). This was previously a blocker for users with
   multiple ports/protocols per k8s service object.

2. Builds mesh traffic policies for LDS/CDS/RDS configs
   inline consistently. This simplifies the code and makes
   it a lot easier to test config generation based on the
   state of the mesh (service, service accounts, SMI policies
   etc.). It also eliminates outbound policy merges that
   are unnecessary. Building config for different envoy configs
   inline will help when OSM moves to an event-driven config
   generation per proxy/group model, where an event can produce
   configuration using a nearly consistent view of the caches.

3. Allows the TargetPort for a service to be different than it's
   Port value. This is common in k8s to allow app authors to
   change the backend port without needing to update their client
   code.

4. Similar to egress, mesh RDS configs are isolated per port to
   avoid conflicts when the same host needs to be routed differently
   per port. This can happen when the same service exposes multiple
   HTTP ports that need to be routed differently.

5. Correctly generates hostnames for a service-port combination that
   clients can use to access the service.

7. Addresses bugs in endpoint discovery when the service has multiple
   ports. Correctly translates a cluster name to a MeshService to be
   able to retrieve its endpoints.

8. Implements SMI TrafficSplit for permissive mode for both HTTP and
   TCP traffic.

9. Removes and simplifies code that are not required, such as the APIs
   to retrieve port:protocol mappings for a service, and other redundant
   APIs.

10. E2e tests and unit tests for various scenarios: multiple ports per
    service, permissive mode traffic split, different TargetPort for
    service Port etc.

To understand this change better, it's important to note the following
fundamental changes made to address the existing implementation issues
noted above.

- A MeshService now carries information on the port, target port and
  protocol associated with the service. The implementation ensures
  the target port is always set for a service with an endpoint. It is
  necessary to disambiguate between the two because k8s allows routing
  traffic to a service whose Port and TargetPort are different. This
  is of significant importance when it comes to filtering and routing
  traffic on the client and server side. A k8s service with multiple
  ports is transposed to multiple MeshService representations within
  the OSM control plane. A MeshService is the source of truth while
  creating filtering and routing configs in Envoy as opposed to a k8s
  service. In other words, filter chains, clusters per service have
  a 1-1 mapping to a MeshService instance.

- RDS mesh inbound and outbound route configurations are built per
  port, similar to egress route configurations. This ensures there
  can be no conflicts when a service/FQDN needs to be routed differently
  based on the port.

- Endpoints for a MeshService is filtered by its TargetPort is set. This
  is required to endpoints not associated with the MeshService are not
  programmed for upstream clusters. This is required to support multiple
  ports per service.

Part of openservicemesh#4052
Resolves openservicemesh#3777
Resolves openservicemesh#2527

Signed-off-by: Shashank Ram <shashr2204@gmail.com>
shashankram added a commit to shashankram/osm that referenced this issue Sep 3, 2021
This change refactors the core control plane components
to address several gaps in the existing implementation.

At a high level, it resolves the following:

1. Traffic to a service with multiple ports is filtered and
   routed correctly based on the port the traffic is directed
   to (openservicemesh#3777). This was previously a blocker for users with
   multiple ports/protocols per k8s service object.

2. Builds mesh traffic policies for LDS/CDS/RDS configs
   inline consistently. This simplifies the code and makes
   it a lot easier to test config generation based on the
   state of the mesh (service, service accounts, SMI policies
   etc.). It also eliminates outbound policy merges that
   are unnecessary. Building config for different envoy configs
   inline will help when OSM moves to an event-driven config
   generation per proxy/group model, where an event can produce
   configuration using a nearly consistent view of the caches.

3. Allows the TargetPort for a service to be different than it's
   Port value. This is common in k8s to allow app authors to
   change the backend port without needing to update their client
   code.

4. Similar to egress, mesh RDS configs are isolated per port to
   avoid conflicts when the same host needs to be routed differently
   per port. This can happen when the same service exposes multiple
   HTTP ports that need to be routed differently.

5. Correctly generates hostnames for a service-port combination that
   clients can use to access the service.

7. Addresses bugs in endpoint discovery when the service has multiple
   ports. Correctly translates a cluster name to a MeshService to be
   able to retrieve its endpoints.

8. Implements SMI TrafficSplit for permissive mode for both HTTP and
   TCP traffic.

9. Removes and simplifies code that are not required, such as the APIs
   to retrieve port:protocol mappings for a service, and other redundant
   APIs.

10. E2e tests and unit tests for various scenarios: multiple ports per
    service, permissive mode traffic split, different TargetPort for
    service Port etc.

To understand this change better, it's important to note the following
fundamental changes made to address the existing implementation issues
noted above.

- A MeshService now carries information on the port, target port and
  protocol associated with the service. The implementation ensures
  the target port is always set for a service with an endpoint. It is
  necessary to disambiguate between the two because k8s allows routing
  traffic to a service whose Port and TargetPort are different. This
  is of significant importance when it comes to filtering and routing
  traffic on the client and server side. A k8s service with multiple
  ports is transposed to multiple MeshService representations within
  the OSM control plane. A MeshService is the source of truth while
  creating filtering and routing configs in Envoy as opposed to a k8s
  service. In other words, filter chains, clusters per service have
  a 1-1 mapping to a MeshService instance.

- RDS mesh inbound and outbound route configurations are built per
  port, similar to egress route configurations. This ensures there
  can be no conflicts when a service/FQDN needs to be routed differently
  based on the port.

- Endpoints for a MeshService is filtered by its TargetPort is set. This
  is required to endpoints not associated with the MeshService are not
  programmed for upstream clusters. This is required to support multiple
  ports per service.

Part of openservicemesh#4052
Resolves openservicemesh#3777
Resolves openservicemesh#2527

Signed-off-by: Shashank Ram <shashr2204@gmail.com>
shashankram added a commit to shashankram/osm that referenced this issue Sep 3, 2021
This change refactors the core control plane components
to address several gaps in the existing implementation.

At a high level, it resolves the following:

1. Traffic to a service with multiple ports is filtered and
   routed correctly based on the port the traffic is directed
   to (openservicemesh#3777). This was previously a blocker for users with
   multiple ports/protocols per k8s service object.

2. Builds mesh traffic policies for LDS/CDS/RDS configs
   inline consistently. This simplifies the code and makes
   it a lot easier to test config generation based on the
   state of the mesh (service, service accounts, SMI policies
   etc.). It also eliminates outbound policy merges that
   are unnecessary. Building config for different envoy configs
   inline will help when OSM moves to an event-driven config
   generation per proxy/group model, where an event can produce
   configuration using a nearly consistent view of the caches.

3. Allows the TargetPort for a service to be different than it's
   Port value. This is common in k8s to allow app authors to
   change the backend port without needing to update their client
   code.

4. Similar to egress, mesh RDS configs are isolated per port to
   avoid conflicts when the same host needs to be routed differently
   per port. This can happen when the same service exposes multiple
   HTTP ports that need to be routed differently.

5. Correctly generates hostnames for a service-port combination that
   clients can use to access the service.

7. Addresses bugs in endpoint discovery when the service has multiple
   ports. Correctly translates a cluster name to a MeshService to be
   able to retrieve its endpoints.

8. Implements SMI TrafficSplit for permissive mode for both HTTP and
   TCP traffic.

9. Removes and simplifies code that are not required, such as the APIs
   to retrieve port:protocol mappings for a service, and other redundant
   APIs.

10. E2e tests and unit tests for various scenarios: multiple ports per
    service, permissive mode traffic split, different TargetPort for
    service Port etc.

To understand this change better, it's important to note the following
fundamental changes made to address the existing implementation issues
noted above.

- A MeshService now carries information on the port, target port and
  protocol associated with the service. The implementation ensures
  the target port is always set for a service with an endpoint. It is
  necessary to disambiguate between the two because k8s allows routing
  traffic to a service whose Port and TargetPort are different. This
  is of significant importance when it comes to filtering and routing
  traffic on the client and server side. A k8s service with multiple
  ports is transposed to multiple MeshService representations within
  the OSM control plane. A MeshService is the source of truth while
  creating filtering and routing configs in Envoy as opposed to a k8s
  service. In other words, filter chains, clusters per service have
  a 1-1 mapping to a MeshService instance.

- RDS mesh inbound and outbound route configurations are built per
  port, similar to egress route configurations. This ensures there
  can be no conflicts when a service/FQDN needs to be routed differently
  based on the port.

- Endpoints for a MeshService is filtered by its TargetPort is set. This
  is required to endpoints not associated with the MeshService are not
  programmed for upstream clusters. This is required to support multiple
  ports per service.

Part of openservicemesh#4052
Resolves openservicemesh#3777
Resolves openservicemesh#2527

Signed-off-by: Shashank Ram <shashr2204@gmail.com>
shashankram added a commit to shashankram/osm that referenced this issue Sep 7, 2021
This change refactors the core control plane components
to address several gaps in the existing implementation.

At a high level, it resolves the following:

1. Traffic to a service with multiple ports is filtered and
   routed correctly based on the port the traffic is directed
   to (openservicemesh#3777). This was previously a blocker for users with
   multiple ports/protocols per k8s service object.

2. Builds mesh traffic policies for LDS/CDS/RDS configs
   inline consistently. This simplifies the code and makes
   it a lot easier to test config generation based on the
   state of the mesh (service, service accounts, SMI policies
   etc.). It also eliminates outbound policy merges that
   are unnecessary. Building config for different envoy configs
   inline will help when OSM moves to an event-driven config
   generation per proxy/group model, where an event can produce
   configuration using a nearly consistent view of the caches.

3. Allows the TargetPort for a service to be different than it's
   Port value. This is common in k8s to allow app authors to
   change the backend port without needing to update their client
   code.

4. Similar to egress, mesh RDS configs are isolated per port to
   avoid conflicts when the same host needs to be routed differently
   per port. This can happen when the same service exposes multiple
   HTTP ports that need to be routed differently.

5. Correctly generates hostnames for a service-port combination that
   clients can use to access the service.

7. Addresses bugs in endpoint discovery when the service has multiple
   ports. Correctly translates a cluster name to a MeshService to be
   able to retrieve its endpoints.

8. Implements SMI TrafficSplit for permissive mode for both HTTP and
   TCP traffic.

9. Removes and simplifies code that are not required, such as the APIs
   to retrieve port:protocol mappings for a service, and other redundant
   APIs.

10. E2e tests and unit tests for various scenarios: multiple ports per
    service, permissive mode traffic split, different TargetPort for
    service Port etc.

To understand this change better, it's important to note the following
fundamental changes made to address the existing implementation issues
noted above.

- A MeshService now carries information on the port, target port and
  protocol associated with the service. The implementation ensures
  the target port is always set for a service with an endpoint. It is
  necessary to disambiguate between the two because k8s allows routing
  traffic to a service whose Port and TargetPort are different. This
  is of significant importance when it comes to filtering and routing
  traffic on the client and server side. A k8s service with multiple
  ports is transposed to multiple MeshService representations within
  the OSM control plane. A MeshService is the source of truth while
  creating filtering and routing configs in Envoy as opposed to a k8s
  service. In other words, filter chains, clusters per service have
  a 1-1 mapping to a MeshService instance.

- RDS mesh inbound and outbound route configurations are built per
  port, similar to egress route configurations. This ensures there
  can be no conflicts when a service/FQDN needs to be routed differently
  based on the port.

- Endpoints for a MeshService is filtered by its TargetPort is set. This
  is required to endpoints not associated with the MeshService are not
  programmed for upstream clusters. This is required to support multiple
  ports per service.

Part of openservicemesh#4052
Resolves openservicemesh#3777
Resolves openservicemesh#2527

Signed-off-by: Shashank Ram <shashr2204@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/control-plane Related to OSM's control plane kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant