Skip to content

Commit

Permalink
Fixes from testing
Browse files Browse the repository at this point in the history
  • Loading branch information
plorenz committed Nov 1, 2024
1 parent dec3d4a commit f19a6e0
Show file tree
Hide file tree
Showing 39 changed files with 611 additions and 261 deletions.
154 changes: 152 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,78 @@

## What's New

* New Router Metrics
* Changes to identity connect status
* HA Bootstrap Changes
* Connect Events
* SDK Events
* Bug fixes and other HA work

## New Router Metrics

The following new metrics are available for edge routers:

1. edge.connect.failures - meter tracking failed connect attempts from sdks
This tracks failures to not having a valid token. Other failures which
happen earlier in the connection process may not be tracked here.
2. edge.connect.successes - meter tracking successful connect attempts from sdks
3. edge.disconnects - meter tracking disconnects of previously successfully connected sdks
4. edge.connections - gauge tracking count of currently connected sdks

## Identity Connect Status

Ziti tracks whether an identity is currently connected to an edge router.
This is the `hasEdgeRouterConnection` field on Identity.

Identity connection status used to be driven off of heartbeats from the edge router.
This feature doesn't work correctly when running with controller HA.

To address this, while also providing more operation insight, connect events were added
(see below for more details on the events themselves).

The controller can be configured to use status from heartbeats, connect events or both.
If both are used as source, then if either reports the identity as connected, then it
will show as connected. This is intended for when you have a mix of routers and they
don't all yet supported connect events.

The controller now also aims to be more precise about identity state. There is a new
field on Identity: `edgeRouterConnectionStatus`. This field can have one of three
values:

* offline
* online
* unknown

If the identity is reported as connected to any ER, it will be marked as `online`.
If the identity has been reported as connected, but the reporting ER is now
offline, the identity may still be connected to the ER. While in this state
it will be marked as 'unknown'. After a configurable interval, it will be marked
as offline.

New controller config options:

```
identityStatusConfig:
# valid values ['heartbeats', 'connect-events', 'hybrid']
# defaults to 'hybrid' for now
source: connect-events
# determines how often we scan for disconnected routers
# defaults to 1 minute
scanInterval: 1m
# determines how long an identity will stay in unknown status before it's marked as offline
# defaults to 5m
unknownTimeout: 5m
```

## HA Bootstrapping Changes

Previously bootstrapping the RAFT cluster and initializing the controller with a
default administrator were separate operations.
Now, the raft cluster will be bootstrapped whenever the controller is initialized.

The controller can be initilized as follows:
The controller can be initialized as follows:

1. Using `ziti agent controller init`
2. Using `ziti agent controller init-from-db`
Expand All @@ -34,7 +96,9 @@ These are events generated when a successful connection is made to a controller,
2. Router
3. Controller (peer in an HA cluster)

**Configuration**
They are also generated when an SDK connects to a router.

**Controller Configuration**

```yml
events:
Expand All @@ -47,6 +111,36 @@ events:
path: /tmp/ziti-events.log
```
**Router Configuration**
```yml
connectEvents:
# defaults to true.
# If set to false, minimal information about which identities are connected will still be
# sent to the controller, so the `edgeRouterConnectionStatus` field can be populated,
# but connect events will not be generated.
enabled: true

# The interval at which connect information will be batched up and sent to the controller.
# Shorter intervals will improve data resolution on the controller. Longer intervals could
# more efficient.
batchInterval: 3s

# The router will also periodically sent the full state to the controller, to ensure that
# it's in sync. It will do this automatically if the router gets disconnected from the
# controller, or if the router is unable to send a connect events messages to the controller.
# This controls how often the full state will be sent under ordinairy conditions
fullSyncInterval: 5m

# If enabled is set to true, the router will collect connect events and send them out
# at the configured batch interval. If there are a huge number of connecting identities
# or if the router is disconnected from the controller for a time, it may be unable to
# send events. In order to prevent queued events from exhausing memory, a maximum

Check failure on line 137 in CHANGELOG.md

View workflow job for this annotation

GitHub Actions / codespell

exhausing ==> exhausting
# queue size is configured.
# Default value 100,000
maxQueuedEvents: 100000

```
**Example Events**
```json
Expand Down Expand Up @@ -79,6 +173,62 @@ events:
}
```

## SDK Events

Building off of the connect events, there are events generated when an identity/sdk comes online or goes offline.

```yml
events:
jsonLogger:
subscriptions:
- type: sdk
handler:
type: file
format: json
path: /tmp/ziti-events.log
```
```json
{
"namespace": "sdk",
"event_type" : "sdk-online",
"identity_id": "ji2Rt8KJ4",
"timestamp": "2024-10-02T12:17:39.501821249-04:00"
}

{
"namespace": "sdk",
"event_type" : "sdk-status-unknown",
"identity_id": "ji2Rt8KJ4",
"timestamp": "2024-10-02T12:17:40.501821249-04:00"
}

{
"namespace": "sdk",
"event_type" : "sdk-offline",
"identity_id": "ji2Rt8KJ4",
"timestamp": "2024-10-02T12:17:41.501821249-04:00"
}
```

## Component Updates and Bug Fixes

* github.com/openziti/channel/v3: [v3.0.5 -> v3.0.6](https://github.com/openziti/channel/compare/v3.0.5...v3.0.6)
* github.com/openziti/edge-api: [v0.26.32 -> v0.26.34](https://github.com/openziti/edge-api/compare/v0.26.32...v0.26.34)
* github.com/openziti/identity: [v1.0.85 -> v1.0.86](https://github.com/openziti/identity/compare/v1.0.85...v1.0.86)

* github.com/openziti/sdk-golang: [v0.23.43 -> v0.23.44](https://github.com/openziti/sdk-golang/compare/v0.23.43...v0.23.44)
* github.com/openziti/transport/v2: [v2.0.146 -> v2.0.147](https://github.com/openziti/transport/compare/v2.0.146...v2.0.147)
* github.com/openziti/ziti: [v1.1.15 -> v1.2.0](https://github.com/openziti/ziti/compare/v1.1.15...v1.2.0)
* [Issue #2212](https://github.com/openziti/ziti/issues/2212) - Rework distributed control bootstrap mechanism
* [Issue #1835](https://github.com/openziti/ziti/issues/1835) - Add access log for rest and router connections
* [Issue #2478](https://github.com/openziti/ziti/issues/2478) - JWT signer secondary auth: not enough information to continue
* [Issue #2482](https://github.com/openziti/ziti/issues/2482) - router run command - improperly binds 127.0.0.1:53/udp when tunnel mode is not tproxy
* [Issue #2474](https://github.com/openziti/ziti/issues/2474) - Enable Ext JWT Enrollment/Generic Trust Bootstrapping
* [Issue #2471](https://github.com/openziti/ziti/issues/2471) - Service Access for Legacy SDKs in HA does not work
* [Issue #2468](https://github.com/openziti/ziti/issues/2468) - enrollment signing cert is not properly identified


# Release 1.1.15

## What's New
Expand Down
10 changes: 9 additions & 1 deletion common/inspect/connect_inspections.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,18 @@

package inspect

const (
RouterIdentityConnectionStatusesKey = "identity-connection-statuses"
)

type RouterIdentityConnections struct {
IdentityConnections map[string]*RouterIdentityConnectionDetail `json:"identity_connections"`
LastFullSync string `json:"last_full_sync"`
QueuedEventCount int64 `json:"queued_event_count"`
MaxQueuedEvents int64 `json:"max_queued_events"`
NeedFullSync []string `json:"need_full_sync"`
BatchInterval string `json:"batch_interval"`
FullSyncInterval string `json:"full_sync_interval"`
}

type RouterIdentityConnectionDetail struct {
Expand All @@ -39,7 +46,8 @@ type RouterConnectionDetail struct {
}

type CtrlIdentityConnections struct {
Connections map[string]*CtrlIdentityConnectionDetail `json:"connections"`
Connections map[string]*CtrlIdentityConnectionDetail `json:"connections"`
ScanInterval string `json:"scanInterval"`
}

type CtrlIdentityConnectionDetail struct {
Expand Down
6 changes: 3 additions & 3 deletions common/oidc_tokens.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ func (r *RefreshClaims) GetIssuer() (string, error) {
}

func (r *RefreshClaims) GetSubject() (string, error) {
return r.TokenClaims.Issuer, nil
return r.TokenClaims.Subject, nil
}

func (r *RefreshClaims) GetAudience() (jwt.ClaimStrings, error) {
Expand Down Expand Up @@ -211,7 +211,7 @@ func (r *AccessClaims) GetIssuer() (string, error) {
}

func (r *AccessClaims) GetSubject() (string, error) {
return r.TokenClaims.Issuer, nil
return r.TokenClaims.Subject, nil
}

func (r *AccessClaims) GetAudience() (jwt.ClaimStrings, error) {
Expand Down Expand Up @@ -260,7 +260,7 @@ func (r *IdTokenClaims) GetIssuer() (string, error) {
}

func (r *IdTokenClaims) GetSubject() (string, error) {
return r.TokenClaims.Issuer, nil
return r.TokenClaims.Subject, nil
}

func (r *IdTokenClaims) GetAudience() (jwt.ClaimStrings, error) {
Expand Down
9 changes: 9 additions & 0 deletions controller/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,15 @@ func (self *Controller) agentOpRaftAddPeer(m *channel.Message, ch channel.Channe
return
}

if !self.raftController.IsBootstrapped() {
handler_common.SendOpResult(m, ch, "raft.list",
"Local instance not initialized. If all instances are uninitialized, initialize one first using"+
" 'ziti agent controller init' then add nodes to that instance. If you have existing initialized nodes,"+
" use the 'ziti agent cluster add' command on an initialized node, rather than here.",
false)
return
}

addr, found := m.GetStringHeader(AgentAddrHeader)
if !found {
handler_common.SendOpResult(m, ch, "raft.join", "address not supplied", false)
Expand Down
2 changes: 1 addition & 1 deletion controller/config/config_edge.go
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,7 @@ func (c *EdgeConfig) loadIdentityStatusConfig(cfgmap map[interface{}]interface{}
case "hybrid":
c.IdentityStatusConfig.Source = IdentityStatusSourceHybrid
default:
pfxlog.Logger().Error("invalid value '%v' for identity status config source, valid values: ['heartbeats', 'connect-events', 'hybrid']", strVal)
pfxlog.Logger().Errorf("invalid value '%v' for identity status config source, valid values: ['heartbeats', 'connect-events', 'hybrid']", strVal)
}
}

Expand Down
2 changes: 1 addition & 1 deletion controller/env/appenv.go
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ func (ae *AppEnv) ValidateServiceAccessToken(token string, apiSessionId *string)
}

if serviceAccessClaims.ApiSessionId != *apiSessionId {
return nil, fmt.Errorf("invalid api sessoin id, expected %s, got %s", *apiSessionId, serviceAccessClaims.ApiSessionId)
return nil, fmt.Errorf("invalid api session id, expected %s, got %s", *apiSessionId, serviceAccessClaims.ApiSessionId)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ func (handler *validateIdentityConnectionStatusesHandler) validRouterEdgeConnect
var identityConnections *inspect.RouterIdentityConnections

if router.Control != nil && !router.Control.IsClosed() {
request := &ctrl_pb.InspectRequest{RequestedValues: []string{"connections"}}
request := &ctrl_pb.InspectRequest{RequestedValues: []string{inspect.RouterIdentityConnectionStatusesKey}}
resp := &ctrl_pb.InspectResponse{}
respMsg, err := protobufs.MarshalTyped(request).WithTimeout(time.Minute).SendForReply(router.Control)
if err = protobufs.TypedResponse(resp).Unmarshall(respMsg, err); err != nil {
Expand All @@ -144,7 +144,7 @@ func (handler *validateIdentityConnectionStatusesHandler) validRouterEdgeConnect
}

for _, val := range resp.Values {
if val.Name == "connections" {
if val.Name == inspect.RouterIdentityConnectionStatusesKey {
if err = json.Unmarshal([]byte(val.Value), &identityConnections); err != nil {
handler.reportError(router, err, cb)
return
Expand Down
2 changes: 1 addition & 1 deletion controller/handler_peer_ctrl/add_peer.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ import (
raft2 "github.com/hashicorp/raft"
"github.com/michaelquigley/pfxlog"
"github.com/openziti/channel/v3"
"github.com/openziti/ziti/common/pb/cmd_pb"
"github.com/openziti/ziti/controller/peermsg"
"github.com/openziti/ziti/controller/raft"
"github.com/openziti/ziti/common/pb/cmd_pb"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"google.golang.org/protobuf/proto"
Expand Down
4 changes: 2 additions & 2 deletions controller/handler_peer_ctrl/bind.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ import (
"github.com/michaelquigley/pfxlog"
"github.com/openziti/channel/v3"
"github.com/openziti/channel/v3/latency"
"github.com/openziti/ziti/controller/network"
"github.com/openziti/ziti/controller/raft"
"github.com/openziti/foundation/v2/concurrenz"
"github.com/openziti/metrics"
"github.com/openziti/ziti/controller/network"
"github.com/openziti/ziti/controller/raft"
"github.com/sirupsen/logrus"
"time"
)
Expand Down
2 changes: 1 addition & 1 deletion controller/handler_peer_ctrl/command.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ package handler_peer_ctrl
import (
"github.com/michaelquigley/pfxlog"
"github.com/openziti/channel/v3"
"github.com/openziti/foundation/v2/goroutines"
"github.com/openziti/ziti/common/metrics"
"github.com/openziti/ziti/common/pb/cmd_pb"
"github.com/openziti/ziti/controller/raft"
"github.com/openziti/foundation/v2/goroutines"
"github.com/sirupsen/logrus"
"time"
)
Expand Down
2 changes: 1 addition & 1 deletion controller/handler_peer_ctrl/remove_peer.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ import (
raft2 "github.com/hashicorp/raft"
"github.com/michaelquigley/pfxlog"
"github.com/openziti/channel/v3"
"github.com/openziti/ziti/common/pb/cmd_pb"
"github.com/openziti/ziti/controller/peermsg"
"github.com/openziti/ziti/controller/raft"
"github.com/openziti/ziti/common/pb/cmd_pb"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"google.golang.org/protobuf/proto"
Expand Down
Loading

0 comments on commit f19a6e0

Please sign in to comment.