Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): add opentelemetry backend in orb-agent, policies. #2780

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
31b3f1f
chore: wip work on otel backend.
Aug 3, 2023
0d0808f
chore: use otel v.0.81.0 to not break the api.
Aug 7, 2023
54c6de1
feat(agent): wip otel backend.
Aug 7, 2023
08c1390
feat(agent): wip otel backend.
Aug 7, 2023
02f596e
feat(agent): add makefile and started work to import and run executab…
Aug 8, 2023
c9bfb76
feat(agent): add gitignore to ignore the binary files in github.
Aug 8, 2023
c6c156e
feat(agent): implement version method.
Aug 8, 2023
763209a
feat(agent): implement add and start testing to control policies.
Aug 8, 2023
75ee97c
feat(agent): implement add and start testing to control policies.
Aug 10, 2023
3997dea
feat(agent): fix behaviors and implement missing features.
Aug 14, 2023
ec49822
feat(agent): add otel registered backend.
Aug 14, 2023
aaad3b4
feat(agent): fix configuration pass-through.
Aug 14, 2023
489d9b9
feat(agent): fix configuration pass-through.
Aug 14, 2023
218206f
feat(agent): add sample policy.
Aug 14, 2023
c0c90f0
feat(agent): fix apply policy behaviour.
Aug 14, 2023
96ed0c7
feat(agent): add default configurations to inside pktvisor code, remo…
Aug 15, 2023
aa7482d
feat(agent): wip try to fix startup.
Aug 15, 2023
2567e65
feat(agent): wip try to fix startup.
Aug 16, 2023
db02c74
feat(agent): wip try to fix startup.
Aug 16, 2023
2891677
feat(agent): fix startup and multiple routines working the same execu…
Aug 17, 2023
536a943
feat(agent): implement configuration passing to temporary file.
Aug 17, 2023
b34aa0a
feat(agent): swapping from go-memexec to previously used go-cmd.
Aug 18, 2023
803d06d
feat(agent): add build and fix make agent_full goal to contemplate ot…
Aug 22, 2023
09b7691
feat(agent): add build and fix make agent_full goal to contemplate ot…
Aug 22, 2023
fbfd144
feat(agent): fix build with Dockerfile.
Aug 23, 2023
8553f24
feat(agent): swapping from go-memexec to previously used go-cmd.
Aug 18, 2023
5b848ce
feat(agent): removing debug and build helping flags.
Aug 23, 2023
35b4fd0
feat(agent): wip
Aug 25, 2023
ca0e49f
Merge branch 'develop' into poc/otel-backend
Oct 12, 2023
030b88d
feat(mod): update libs.
Oct 12, 2023
517fe15
feat(agent): update agent otel dependencies changes.
Oct 12, 2023
4fe89a8
feat(maestro): update agent otel dependencies changes.
Oct 12, 2023
54a48aa
feat(agent): log identation.
Oct 12, 2023
75343bc
feat(agent): fix version.
Oct 12, 2023
886de73
remove file
Aug 25, 2023
399452b
feat(agent): fix build with Dockerfile.
Aug 28, 2023
ee83138
feat(maestro): standardize log references.
Aug 29, 2023
eded89d
feat(agent): fix typing warnings.
Oct 16, 2023
086d465
feat(agent): fix options on go-cmd.
Oct 16, 2023
5013ffb
feat(policies): explain validate.
Oct 18, 2023
6fd657e
feat(maestro): update collector version.
Oct 18, 2023
3f702b3
feat(agent): add validate policy in agent.
Oct 18, 2023
5a9f704
feat(agent): update sample.
Oct 19, 2023
46c75aa
feat(sinker): update otel.
Oct 19, 2023
57600a7
feat(sinker): update otel.
Oct 19, 2023
283afdd
fix(maestro): conflict fix.
Oct 20, 2023
04d5645
fix(policies): add otel backend register.
Oct 20, 2023
57c9c5d
feat(policies): pass yaml instead of json for otel yaml policies.
Oct 20, 2023
6c499af
feat(agent): otel validate policy will bypass for now, until it is in…
Oct 20, 2023
0268917
feat(policies): add todo explaining unwanted check.
Oct 20, 2023
140e149
feat(agent): fix agent to receive policy.
Oct 20, 2023
ab8275c
feat(agent): fix agent to receive policy.
Oct 20, 2023
b6ac37f
fix(policies): add return data for yaml in retrieve policies per group.
Oct 20, 2023
5e1d255
fix(policies): fix yaml marshaling.
Oct 20, 2023
5abe9b0
fix(agent): fix receiveOTLP on otel backend.
Oct 20, 2023
b8cb1c8
fix(agent): fix receiveOTLP on otel backend.
Oct 20, 2023
b288373
fix(agent): fix receiveOTLP on otel backend.
Oct 20, 2023
34e2304
fix(agent): fix cmd options for streaming.
Oct 20, 2023
5cdebd8
fix(fleet): add backend to be able to create policy on UI.
Oct 20, 2023
9b3dfe7
fix(agent): fix cmd options for streaming.
Oct 24, 2023
bb96507
fix(agent): fixed receiver host and port for now.
Oct 24, 2023
4aa0d36
fix(fleet): rename.
Oct 25, 2023
1cd60d2
fix(agent): make the agent wait until the receiver start to apply pol…
Oct 25, 2023
ab153e2
fix(agent): create a backend ready for comms creation in parallel.
Oct 25, 2023
49659bf
fix(agent): create a backend ready for comms creation in parallel.
Oct 25, 2023
fe6946b
fix(agent): remove custom endpoints.
Oct 25, 2023
ed08170
fix(agent): fix connection between otel-backend and orb-agent.
Oct 31, 2023
e534c71
fix(agent): fix connection between otel-backend and orb-agent.
Oct 31, 2023
ca2e1e0
fix(agent): fix mqtt connection.
Oct 31, 2023
cfb4bf3
fix(agent): change 0.0.0.0 to localhost which limits the interface.
Oct 31, 2023
e2b3422
feat(agent): change policyName to instead of being to scope.Name(), t…
Oct 31, 2023
eb1c5b5
feat(agent): WIP
Oct 31, 2023
a72bbd1
feat(agent): add struct parse to otel configuration.
Nov 1, 2023
9d9829f
feat(agent): add test.
Nov 1, 2023
2b5bb7b
feat(agent): add piece to call the builder to merge with default struct.
Nov 1, 2023
84c30be
feat(agent): add correct metrics and scope attributes to pass policy-…
Nov 2, 2023
960b4e1
feat(agent): fix unit tests.
Nov 2, 2023
11a4c67
feat(agent): add missing libs.
Nov 2, 2023
e1437fe
fix(agent): fix agent id on reset.
Nov 7, 2023
4dce958
fix(agent): fix agent id on reset.
Nov 8, 2023
d3a1dd5
fix(agent): standardize the env var naming.
Nov 8, 2023
d2a303f
fix(agent): clean up.
Nov 8, 2023
011ce30
Update policies/backend/otel/otel.go
lpegoraro Nov 8, 2023
7809bf6
fix(fleet): clean up.
Nov 13, 2023
e2ee4a7
:Merge branch 'eng-1056-orb-policies-opentelemetry-configuration' of …
Nov 13, 2023
cfe5291
fix(agent): clean up and fix order of startup.
Nov 14, 2023
25b1232
fix(agent): clean up.
Nov 14, 2023
a990fe3
fix(agent): clean up.
Nov 14, 2023
561f207
fix(agent): rollback fullreset as it was before fixing reset.
Nov 14, 2023
ea2b9ff
fix(agent): attempt to fix communication between reset from orb agent…
Nov 14, 2023
e0a88c6
fix(agent): add more scrape info on opentelemetry to fix reset action…
Nov 14, 2023
0aa7afb
fix(agent): add run configuration and fix version change on otlprecei…
Nov 15, 2023
3fc0fdf
fix(agent): inverted boolean return to improve readability.
Nov 15, 2023
abb162c
fix(agent): inverted boolean return to improve readability.
Nov 15, 2023
113db6d
fix(agent): fixed nil function reference error.
Nov 15, 2023
6ed419c
fix(agent): fixed nil function reference error.
Nov 15, 2023
042548e
[pktvisor] add missing ReportComponentStatus
mfiedorowicz Nov 15, 2023
25dc98d
Merge remote-tracking branch 'origin/eng-1056-orb-policies-openteleme…
mfiedorowicz Nov 15, 2023
489906a
Update agent/backend/otel/otel.go
lpegoraro Nov 15, 2023
ebebf95
fix(agent): fix indentation.
Nov 15, 2023
b0a8f14
fix(agent): moved the heartbeat to correct order on reset.
Nov 15, 2023
894eff2
fix(agent): fix otlpmqtt exporter traces, metrics, logs.
Nov 16, 2023
c348e7c
[pktvisor] re-add pktvisor backend defaults
mfiedorowicz Nov 16, 2023
9ed019a
fix(agent): remove logs and traces and focus on metrics for now.
Nov 16, 2023
aee6a5e
fix(agent): replace defaults in pktvisor to instead of setting the de…
Nov 16, 2023
1d3d96b
fix(agent): add mechanism to enable multiple policies, because teleme…
Nov 16, 2023
6d1679f
fix(agent): add mechanism to enable multiple policies, because teleme…
Nov 16, 2023
6896f39
fix(agent): add mechanism to enable multiple policies, because teleme…
Nov 16, 2023
7f021c2
fix(agent): fix unit test.
Nov 16, 2023
323c5a6
Update agent/backend/otel/policy.go
lpegoraro Nov 16, 2023
1fa1718
fix(agent): fix url build.
Nov 16, 2023
9a08d79
fix(agent): readd debug.
Nov 16, 2023
1b3de08
Apply suggestions from code review
lpegoraro Nov 16, 2023
6b2556e
fix(agent): err scope.
Nov 16, 2023
e02b92a
Apply suggestions from code review
lpegoraro Nov 16, 2023
3056bbd
fix(agent): apply code review changes.
Nov 16, 2023
5b17f69
fix(agent): fix err scope.
Nov 16, 2023
35f4efd
fix(agent): fix import err.
Nov 16, 2023
ca04d9d
fix download of otelcol-contrib
mfiedorowicz Nov 17, 2023
9584351
simplify/minimise Dockerfile.full
mfiedorowicz Nov 17, 2023
b45e3b9
chore(mod): update and merge dependencies.
Nov 17, 2023
02078db
Merge branch 'develop' into eng-1056-orb-policies-opentelemetry-confi…
Nov 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,5 @@ docker/otel-collector-config.yaml

kind/*
!kind/README.md

otelcol-contrib

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 16 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,12 @@ DOCKERS = $(addprefix docker_,$(SERVICES))
DOCKERS_DEV = $(addprefix docker_dev_,$(SERVICES))
CGO_ENABLED ?= 0
GOARCH ?= $(shell dpkg-architecture -q DEB_BUILD_ARCH)
GOOS ?= $(shell dpkg-architecture -q DEB_TARGET_ARCH_OS)
DIODE_TAG ?= develop
ORB_VERSION = $(shell cat VERSION)
COMMIT_HASH = $(shell git rev-parse --short HEAD)
OTEL_COLLECTOR_CONTRIB_VERSION ?= 0.87.0
OTEL_CONTRIB_URL ?= "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v$(OTEL_COLLECTOR_CONTRIB_VERSION)/otelcol-contrib_$(OTEL_COLLECTOR_CONTRIB_VERSION)_$(GOOS)_$(GOARCH).tar.gz"

define compile_service
echo "ORB_VERSION: $(ORB_VERSION)"
Expand Down Expand Up @@ -228,6 +232,7 @@ agent_bin:

agent:
docker build --no-cache \
--build-arg GOARCH=$(GOARCH) \
--build-arg PKTVISOR_TAG=$(PKTVISOR_TAG) \
--tag=$(ORB_DOCKERHUB_REPO)/$(DOCKER_IMAGE_NAME_PREFIX)-agent:$(REF_TAG) \
--tag=$(ORB_DOCKERHUB_REPO)/$(DOCKER_IMAGE_NAME_PREFIX)-agent:$(ORB_VERSION) \
Expand All @@ -236,9 +241,11 @@ agent:

agent_full:
docker build --no-cache \
--build-arg GOARCH=$(GOARCH) \
--build-arg PKTVISOR_TAG=$(PKTVISOR_TAG) \
--build-arg DIODE_TAG=$(DIODE_TAG) \
--build-arg ORB_TAG=${ORB_TAG} \
--build-arg ORB_TAG=${REF_TAG} \
--build-arg OTEL_TAG=${OTEL_COLLECTOR_CONTRIB_VERSION} \
--tag=$(ORB_DOCKERHUB_REPO)/$(DOCKER_IMAGE_NAME_PREFIX)-agent-full:$(REF_TAG) \
--tag=$(ORB_DOCKERHUB_REPO)/$(DOCKER_IMAGE_NAME_PREFIX)-agent-full:$(ORB_VERSION) \
--tag=$(ORB_DOCKERHUB_REPO)/$(DOCKER_IMAGE_NAME_PREFIX)-agent-full:$(ORB_VERSION)-$(COMMIT_HASH) \
Expand Down Expand Up @@ -284,3 +291,11 @@ ui:
-f docker/Dockerfile .

platform: dockers_dev agent ui

pull-latest-otel-collector-contrib:
wget -O ./agent/backend/otel/otelcol_contrib.tar.gz $(OTEL_CONTRIB_URL)
tar -xvf ./agent/backend/otel/otelcol_contrib.tar.gz -C ./agent/backend/otel/
cp ./agent/backend/otel/otelcol-contrib .
rm ./agent/backend/otel/otelcol_contrib.tar.gz
rm ./agent/backend/otel/LICENSE
rm ./agent/backend/otel/README.md
53 changes: 33 additions & 20 deletions agent/agent_prof.go → agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,13 @@ type orbAgent struct {
logger *zap.Logger
config config.Config
client mqtt.Client
agent_id string
db *sqlx.DB
backends map[string]backend.Backend
backendState map[string]*backend.State
cancelFunction context.CancelFunc
rpcFromCancelFunc context.CancelFunc
// TODO: look for a better way to do this, context shouldn't be inside structs

asyncContext context.Context

hbTicker *time.Ticker
Expand Down Expand Up @@ -91,6 +92,11 @@ func New(logger *zap.Logger, c config.Config) (Agent, error) {

pm, err := manager.New(logger, c, db)
if err != nil {
logger.Error("error during create policy manager, exiting", zap.Error(err))
return nil, err
}
if pm.GetRepo() == nil {
logger.Error("policy manager failed to get repository", zap.Error(err))
return nil, err
}
return &orbAgent{logger: logger, config: c, policyManager: pm, db: db, groupsInfos: make(map[string]GroupInfo)}, nil
Expand All @@ -112,6 +118,7 @@ func (a *orbAgent) startBackends(agentCtx context.Context) error {
configuration := structs.Map(a.config.OrbAgent.Otel)
configuration["agent_tags"] = a.config.OrbAgent.Tags
if err := be.Configure(a.logger, a.policyManager.GetRepo(), configurationEntry, configuration); err != nil {
a.logger.Info("failed to configure backend", zap.String("backend", name), zap.Error(err))
return err
}
backendCtx := context.WithValue(agentCtx, "routine", name)
Expand All @@ -120,14 +127,20 @@ func (a *orbAgent) startBackends(agentCtx context.Context) error {
} else {
backendCtx = context.WithValue(backendCtx, "agent_id", "auto-provisioning-without-id")
}
if err := be.Start(context.WithCancel(backendCtx)); err != nil {
return err
}
a.backends[name] = be
a.backendState[name] = &backend.State{
Status: backend.Unknown,
Status: be.GetInitialState(),
LastRestartTS: time.Now(),
}
if err := be.Start(context.WithCancel(backendCtx)); err != nil {
a.logger.Info("failed to start backend", zap.String("backend", name), zap.Error(err))
a.backendState[name] = &backend.State{
Status: be.GetInitialState(),
LastError: err.Error(),
LastRestartTS: time.Now(),
}
return err
}
}
return nil
}
Expand All @@ -151,10 +164,6 @@ func (a *orbAgent) Start(ctx context.Context, cancelFunc context.CancelFunc) err
mqtt.DEBUG = &agentLoggerDebug{a: a}
}

if err := a.startBackends(ctx); err != nil {
return err
}

ccm, err := cloud_config.New(a.logger, a.config, a.db)
if err != nil {
return err
Expand All @@ -170,6 +179,10 @@ func (a *orbAgent) Start(ctx context.Context, cancelFunc context.CancelFunc) err
return err
}

if err := a.startBackends(ctx); err != nil {
return err
}

a.logonWithHearbeat()

return nil
Expand All @@ -184,7 +197,9 @@ func (a *orbAgent) logonWithHearbeat() {

func (a *orbAgent) logoffWithHeartbeat(ctx context.Context) {
a.logger.Debug("stopping heartbeat, going offline status", zap.Any("routine", ctx.Value("routine")))
a.hbTicker.Stop()
if a.hbTicker != nil {
a.hbTicker.Stop()
}
if a.rpcFromCancelFunc != nil {
a.rpcFromCancelFunc()
}
Expand Down Expand Up @@ -253,9 +268,9 @@ func (a *orbAgent) RestartBackend(ctx context.Context, name string, reason strin
a.backendState[name].LastError = fmt.Sprintf("failed to reset backend: %v", err)
a.logger.Error("failed to reset backend", zap.String("backend", name), zap.Error(err))
}
be.SetCommsClient(a.config.OrbAgent.Cloud.MQTT.Id, &a.client, fmt.Sprintf("%s/?/%s", a.baseTopic, name))
err := a.sendAgentPoliciesReq()
if err != nil {
be.SetCommsClient(a.agent_id, &a.client, fmt.Sprintf("%s/?/%s", a.baseTopic, name))

if err := a.sendAgentPoliciesReq(); err != nil {
a.logger.Error("failed to send agent policies request", zap.Error(err))
}
return nil
Expand Down Expand Up @@ -288,20 +303,18 @@ func (a *orbAgent) RestartAll(ctx context.Context, reason string) error {
} else {
ctx = context.WithValue(ctx, "agent_id", "auto-provisioning-without-id")
}
a.logger.Info("restarting all backends", zap.String("reason", reason))
a.logoffWithHeartbeat(ctx)
a.logger.Info("restarting comms", zap.String("reason", reason))
if err := a.restartComms(ctx); err != nil {
a.logger.Error("failed to restart comms", zap.Error(err))
}
for name := range a.backends {
a.logger.Info("restarting backend", zap.String("backend", name), zap.String("reason", reason))
err := a.RestartBackend(ctx, name, reason)
if err != nil {
a.logger.Error("failed to restart backend", zap.Error(err))
}
}
a.logger.Info("restarting comms", zap.String("reason", reason))
a.logoffWithHeartbeat(ctx)
err := a.restartComms(ctx)
if err != nil {
a.logger.Error("failed to restart comms", zap.Error(err))
}
a.logonWithHearbeat()
a.logger.Info("all backends and comms were restarted")

Expand Down
4 changes: 4 additions & 0 deletions agent/backend/backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ const (
BackendError
AgentError
Offline
Waiting
)

type RunningStatus int
Expand All @@ -29,6 +30,7 @@ var runningStatusMap = [...]string{
"backend_error",
"agent_error",
"offline",
"waiting",
}

var runningStatusRevMap = map[string]RunningStatus{
Expand All @@ -37,6 +39,7 @@ var runningStatusRevMap = map[string]RunningStatus{
"backend_error": BackendError,
"agent_error": AgentError,
"offline": Offline,
"waiting": Waiting,
}

type State struct {
Expand All @@ -62,6 +65,7 @@ type Backend interface {
GetStartTime() time.Time
GetCapabilities() (map[string]interface{}, error)
GetRunningStatus() (RunningStatus, string, error)
GetInitialState() RunningStatus

ApplyPolicy(data policies.PolicyData, updatePolicy bool) error
RemovePolicy(data policies.PolicyData) error
Expand Down
4 changes: 4 additions & 0 deletions agent/backend/diode/diode.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ func (d *diodeBackend) GetStartTime() time.Time {
return d.startTime
}

func (d *diodeBackend) GetInitialState() backend.RunningStatus {
return backend.Unknown
}

func (d *diodeBackend) GetCapabilities() (map[string]interface{}, error) {
return make(map[string]interface{}), nil
}
Expand Down
3 changes: 3 additions & 0 deletions agent/backend/diode/scrape.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ func (d *diodeBackend) receiveOtlp() {
Logger: d.logger,
TracerProvider: trace.NewNoopTracerProvider(),
MeterProvider: metric.NewMeterProvider(),
ReportComponentStatus: func(*component.StatusEvent) error {
return nil
},
},
BuildInfo: component.NewDefaultBuildInfo(),
}
Expand Down
15 changes: 15 additions & 0 deletions agent/backend/otel/comms.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
package otel

import (
"fmt"
mqtt "github.com/eclipse/paho.mqtt.golang"
"strings"
)

func (o *openTelemetryBackend) SetCommsClient(agentID string, client *mqtt.Client, baseTopic string) {
o.mqttClient = client
otelBaseTopic := strings.Replace(baseTopic, "?", "otlp", 1)
o.otlpMetricsTopic = fmt.Sprintf("%s/m/%c", otelBaseTopic, agentID[0])
o.otlpTracesTopic = fmt.Sprintf("%s/t/%c", otelBaseTopic, agentID[0])
o.otlpLogsTopic = fmt.Sprintf("%s/l/%c", otelBaseTopic, agentID[0])
}
Loading
Loading