Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health private preview agent #253

Open
wants to merge 64 commits into
base: ci_feature
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
6e8832e
Pushing to remote for generating image
r-dilip Apr 11, 2019
4b2e5aa
Merging from ci_feature
r-dilip Apr 12, 2019
7c8f4b3
Building New Version of the agent
r-dilip Apr 12, 2019
d4ae283
Publish docker provider as a release, Update image tag
r-dilip Apr 15, 2019
c092910
Onboard Script
r-dilip Apr 18, 2019
e251cc9
Update Agent Version, do actual ARM Deployment
r-dilip Apr 18, 2019
733be41
Use Import-AzAks instead of az cli
r-dilip Apr 19, 2019
ba99de4
Pushing doc to OMS-docker repo
r-dilip Apr 23, 2019
a266899
Adding description
r-dilip Apr 23, 2019
5683b4a
Update HealthAgentOnboarding.ps1
r-dilip Apr 23, 2019
651d271
Update HealthOnboarding.md
r-dilip Apr 23, 2019
ef64af0
Update HealthOnboarding.md
r-dilip Apr 23, 2019
b15fe92
Update HealthOnboarding.md
r-dilip Apr 23, 2019
5332ff0
Update HealthOnboarding.md
r-dilip Apr 23, 2019
584d922
Update HealthOnboarding.md
r-dilip Apr 23, 2019
4186c9c
Integrating telegraf changes
r-dilip Apr 26, 2019
09ca730
Integrate telegraf changes
r-dilip Apr 26, 2019
3f00b86
Fix a bug in the script
r-dilip Apr 26, 2019
444a23b
Merge from ci_feature
r-dilip Jun 18, 2019
5a5882e
Merge remote-tracking branch 'origin/dilipr/onboardHealth' into dilip…
r-dilip Jun 18, 2019
cfb0a6d
Updating template
r-dilip Jun 25, 2019
49c858a
updating agent onboarding script
r-dilip Jun 25, 2019
91a79f1
Updating image name
r-dilip Jun 27, 2019
68b31ba
AKS Engine Template
r-dilip Jun 27, 2019
7a80b1e
Added AKS Engine Onboarding steps
r-dilip Jun 27, 2019
a462aef
Fix Links
r-dilip Jun 27, 2019
b553bce
Minor Update
r-dilip Jun 27, 2019
059f237
Update HealthOnboarding.md
r-dilip Jul 1, 2019
7b6c550
Update HealthOnboarding.md
r-dilip Jul 1, 2019
2292298
Create optouttemplate.json
r-dilip Jul 1, 2019
e4b6189
Update optouttemplate.json
r-dilip Jul 1, 2019
bacd58c
update optouttemplate
r-dilip Jul 1, 2019
ee36ca3
Update optouttemplate.json
r-dilip Jul 1, 2019
e0a4fa6
update optouttemplate
r-dilip Jul 1, 2019
0bb250e
Merge branch 'dilipr/kubeHealth' of github.com:Microsoft/OMS-docker i…
r-dilip Jul 1, 2019
3d2d456
Update optouttemplate.json
r-dilip Jul 1, 2019
e5b9ac8
Update optouttemplate.json
r-dilip Jul 1, 2019
d8dc4da
Update optouttemplate.json
r-dilip Jul 1, 2019
34c36cd
Updatr
r-dilip Jul 1, 2019
709f993
Update HealthOnboarding.md
r-dilip Jul 1, 2019
e98dcd0
Update HealthOnboarding.md
r-dilip Jul 1, 2019
b4c1098
Update HealthOnboarding.md
r-dilip Jul 1, 2019
af603c6
Update HealthOnboarding.md
r-dilip Jul 1, 2019
ad8a5d5
Update HealthOnboarding.md
r-dilip Jul 1, 2019
cf2d2ba
Fix issue where cluster and workspace are in different
r-dilip Jul 1, 2019
00d240e
Merge branch 'dilipr/kubeHealth' of github.com:Microsoft/OMS-docker i…
r-dilip Jul 1, 2019
8c3939f
Update HealthOnboarding.md
r-dilip Jul 1, 2019
71bcd4c
Update HealthOnboarding.md
r-dilip Jul 1, 2019
78cdbda
Update HealthOnboarding.md
r-dilip Jul 1, 2019
95f8caa
Update HealthOnboarding.md
r-dilip Jul 1, 2019
4d867fe
A few changes based on Feedback
r-dilip Jul 3, 2019
2e9e56b
Merge branch 'dilipr/kubeHealth' of github.com:Microsoft/OMS-docker i…
r-dilip Jul 3, 2019
c0fec92
Update HealthOnboarding.md
r-dilip Jul 3, 2019
e584bf6
Update HealthOnboarding.md
r-dilip Jul 4, 2019
1b78f9d
Update HealthOnboarding.md
r-dilip Jul 5, 2019
e78c7cd
updating agent version with bug fix
r-dilip Jul 18, 2019
b968f5c
merged markdown file
r-dilip Jul 18, 2019
389bd02
Making changes to work in Azure CloudShell PowerShell
r-dilip Jul 25, 2019
db3f461
Preliminary merge
r-dilip Jul 30, 2019
5f39989
Update path, and finish merge from ci_feature
r-dilip Aug 1, 2019
09e01fd
Fixing agent version and docker version
r-dilip Aug 1, 2019
3d9d24b
Change pull policy to always for private preview
r-dilip Aug 7, 2019
b56a1cb
Add sleep before kubectl apply
r-dilip Aug 7, 2019
67caad8
Update HealthAgentOnboarding.ps1
r-dilip Aug 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions Kubernetes/container-azm-ms-agentconfig.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ data:
#string.used by customer to keep track of this config file's version in their source control/repository (max allowed 10 chars, other chars will be truncated)
ver1
log-data-collection-settings: |-
# Log data collection settings
# Log data collection settings container-azm-ms-agent settings
[log_collection_settings]
[log_collection_settings.stdout]
# In the absense of this configmap, default value for enabled is true
Expand Down Expand Up @@ -72,8 +72,11 @@ data:
#fieldpass = ["metric_to_pass1", "metric_to_pass12"]

#fielddrop = ["metric_to_drop"]


agent-settings: |-
# agent health model feature settings
[agent_settings.health_model]
# In the absence of this configmap, default value for enabled is false
enabled = false
metadata:
name: container-azm-ms-agentconfig
namespace: kube-system
128 changes: 116 additions & 12 deletions Kubernetes/omsagent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ rules:
- apiGroups: [""]
resources: ["pods", "events", "nodes", "namespaces", "services"]
verbs: ["list", "get", "watch"]
- apiGroups: ["extensions"]
resources: ["deployments"]
verbs: ["list"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
Expand All @@ -33,6 +36,12 @@ apiVersion: v1
data:
kube.conf: |-
# Fluentd config file for OMS Docker - cluster components (kubeAPI)
#fluent forward plugin
<source>
type forward
port 25235
bind 0.0.0.0
</source>

#Kubernetes pod inventory
<source>
Expand Down Expand Up @@ -81,6 +90,14 @@ data:
log_level debug
</source>

#Kubernetes health
<source>
type kubehealth
tag oms.api.KubeHealth.ReplicaSet
run_interval 60s
log_level debug
</source>

#cadvisor perf- Windows nodes
<source>
type wincadvisorperf
Expand All @@ -103,6 +120,11 @@ data:
log_level info
</filter>

#health model aggregation filter
<filter oms.api.KubeHealth**>
type filter_health_model_builder
</filter>

<match oms.containerinsights.KubePodInventory**>
type out_oms
log_level debug
Expand Down Expand Up @@ -249,6 +271,18 @@ data:
max_retry_wait 9m
retry_mdm_post_wait_minutes 60
</match>

<match oms.api.KubeHealth.AgentCollectionTime**>
type out_oms_api
log_level debug
buffer_chunk_limit 10m
buffer_type file
buffer_path %STATE_DIR_WS%/out_oms_api_kubehealth*.buffer
buffer_queue_limit 10
flush_interval 20s
retry_limit 10
retry_wait 30s
</match>
metadata:
name: omsagent-rs-config
namespace: kube-system
Expand All @@ -261,8 +295,8 @@ metadata:
type: Opaque
data:
#BASE64 ENCODED (Both WSID & KEY) INSIDE DOUBLE QUOTE ("")
WSID: "WSID"
KEY: "KEY"
WSID: "VALUE_WSID"
KEY: "VALUE_KEY"
---
apiVersion: extensions/v1beta1
kind: DaemonSet
Expand All @@ -284,7 +318,7 @@ spec:
serviceAccountName: omsagent
containers:
- name: omsagent
image: "mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07092019"
image: "mcr.microsoft.com/azuremonitor/containerinsights/ciprod:healthpreview07182019"
imagePullPolicy: IfNotPresent
resources:
limits:
Expand All @@ -294,13 +328,15 @@ spec:
cpu: 75m
memory: 225Mi
env:
#- name: AKS_RESOURCE_ID
# value: "VALUE_AKS_RESOURCE_ID_VALUE"
#- name: AKS_REGION
# value: "VALUE_AKS_RESOURCE_REGION_VALUE"
- name: AKS_RESOURCE_ID
Copy link
Member

@vishiy vishiy Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you changing defaults?

value: "VALUE_AKS_RESOURCE_ID_VALUE"
- name: AKS_REGION
value: "VALUE_AKS_REGION_VALUE"
#Uncomment below two lines for ACS clusters and set the cluster names manually. Also comment out the above two lines for ACS clusters
- name: ACS_RESOURCE_NAME
value: "my_acs_cluster_name"
#- name: ACS_RESOURCE_NAME
# value: "my_acs_cluster_name"
- name: DISABLE_KUBE_SYSTEM_LOG_COLLECTION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been removed in the last release. This looks like a bad merge...

value: "true"
- name: CONTROLLER_TYPE
value: "DaemonSet"
- name: NODE_IP
Expand Down Expand Up @@ -397,8 +433,8 @@ spec:
spec:
serviceAccountName: omsagent
containers:
- name: omsagent
image: "mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07092019"
- name: omsagent
image: "mcr.microsoft.com/azuremonitor/containerinsights/ciprod:healthpreview07182019"
imagePullPolicy: IfNotPresent
resources:
limits:
Expand Down Expand Up @@ -428,6 +464,9 @@ spec:
protocol: TCP
- containerPort: 25224
protocol: UDP
- containerPort: 25235
protocol: TCP
name: in-rs-tcp
volumeMounts:
- mountPath: /var/run/host
name: docker-sock
Expand All @@ -445,6 +484,8 @@ spec:
- mountPath: /etc/config/settings
name: settings-vol-config
readOnly: true
- mountPath: "/mnt/azure"
name: azurefile-pv
livenessProbe:
exec:
command:
Expand Down Expand Up @@ -482,4 +523,67 @@ spec:
configMap:
name: container-azm-ms-agentconfig
optional: true

- name: azurefile-pv
persistentVolumeClaim:
claimName: azurefile
---
kind: Service
apiVersion: v1
metadata:
name: replicaset-service
Copy link
Member

@vishiy vishiy Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more 'specific' name - so when anyone lists the servcies in the cluster, they can make out that this is for azure monitor containers.

namespace: kube-system
spec:
selector:
rsName: "omsagent-rs"
ports:
- protocol: TCP
port: 25235
targetPort: in-rs-tcp
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azurefile
provisioner: kubernetes.io/azure-file
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=1000
- gid=1000
parameters:
skuName: Standard_LRS
---
apiVersion: rbac.authorization.k8s.io/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope all these things go away as we move away from azure file auto provisioning.

kind: ClusterRole
metadata:
name: system:azure-cloud-provider
rules:
- apiGroups: ['']
resources: ['secrets']
verbs: ['get','create']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:azure-cloud-provider
roleRef:
kind: ClusterRole
apiGroup: rbac.authorization.k8s.io
name: system:azure-cloud-provider
subjects:
- kind: ServiceAccount
name: persistent-volume-binder
namespace: kube-system
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azurefile
namespace: kube-system
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile
resources:
requests:
storage: 10Mi
4 changes: 2 additions & 2 deletions ci_feature/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
FROM ubuntu:16.04
MAINTAINER OMSContainers@microsoft.com
LABEL vendor=Microsoft\ Corp \
com.microsoft.product="OMS Container Docker Provider" \
com.microsoft.version="6.0.0-0"
com.microsoft.product="OMS Container Docker Provider" \
com.microsoft.version="6.0.0-1"
ENV tmpdir /opt
ENV APPLICATIONINSIGHTS_AUTH OTQzNWI0M2YtOTdkNS00ZGVkLThkOTAtYjA0Nzk1OGU2ZTg3
ENV AGENT_VERSION ciprod07092019
Expand Down
3 changes: 1 addition & 2 deletions ci_feature/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ wget https://github.com/Microsoft/OMS-Agent-for-Linux/releases/download/OMSAgent
#create file to disable omi service startup script
touch /etc/.omi_disable_service_control

wget https://github.com/microsoft/Docker-Provider/releases/download/6.0.0.0/docker-cimprov-6.0.0-0.universal.x86_64.sh

wget https://github.com/microsoft/Docker-Provider/releases/download/healthpreview06182019/docker-cimprov-6.0.0-1.universal.x86_64.sh
chmod 775 $TMPDIR/*.sh

#Extract omsbundle
Expand Down
6 changes: 3 additions & 3 deletions ci_feature_prod/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
FROM ubuntu:16.04
MAINTAINER OMSContainers@microsoft.com
LABEL vendor=Microsoft\ Corp \
com.microsoft.product="OMS Container Docker Provider" \
com.microsoft.version="6.0.0-0"
com.microsoft.product="OMS Container Docker Provider" \
com.microsoft.version="6.0.0-1"
ENV tmpdir /opt
ENV APPLICATIONINSIGHTS_AUTH NzAwZGM5OGYtYTdhZC00NThkLWI5NWMtMjA3ZjM3NmM3YmRi
ENV AGENT_VERSION ciprod07092019
ENV AGENT_VERSION healthpreview07182019
ENV HOST_MOUNT_PREFIX /hostfs
ENV HOST_PROC /hostfs/proc
ENV HOST_SYS /hostfs/sys
Expand Down
18 changes: 9 additions & 9 deletions ci_feature_prod/main.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ if [ -S ${DOCKER_SOCKET} ]; then
groupadd -for -g ${DOCKER_GID} ${DOCKER_GROUP}
echo "adding omsagent user to local docker group"
usermod -aG ${DOCKER_GROUP} ${REGULAR_USER}
fi
fi

#Run inotify as a daemon to track changes to the mounted configmap.
inotifywait /etc/config/settings --daemon --recursive --outfile "/opt/inotifyoutput.txt" --event create,delete --format '%e : %T' --timefmt '+%s'
Expand All @@ -48,11 +48,11 @@ else
curl --unix-socket /var/run/host/docker.sock "http:/info" | python -c "import sys, json; print json.load(sys.stdin)['Name']" > /var/opt/microsoft/docker-cimprov/state/containerhostname
fi
#check if file was written successfully.
cat /var/opt/microsoft/docker-cimprov/state/containerhostname
cat /var/opt/microsoft/docker-cimprov/state/containerhostname

#resourceid override for loganalytics data.
if [ -z $AKS_RESOURCE_ID ]; then
echo "not setting customResourceId"
echo "not setting customResourceId"
else
export customResourceId=$AKS_RESOURCE_ID
echo "export customResourceId=$AKS_RESOURCE_ID" >> ~/.bashrc
Expand All @@ -63,7 +63,7 @@ fi
#set agent config schema version
if [ -e "/etc/config/settings/schema-version" ] && [ -s "/etc/config/settings/schema-version" ]; then
#trim
config_schema_version="$(cat /etc/config/settings/schema-version | xargs)"
config_schema_version="$(cat /etc/config/settings/schema-version | xargs)"
#remove all spaces
config_schema_version="${config_schema_version//[[:space:]]/}"
#take first 10 characters
Expand Down Expand Up @@ -92,7 +92,7 @@ fi

# Check for internet connectivity
RET=`curl -s -o /dev/null -w "%{http_code}" http://www.microsoft.com/`
if [ $RET -eq 200 ]; then
if [ $RET -eq 200 ]; then
# Check for workspace existence
if [ -e "/etc/omsagent-secret/WSID" ]; then
workspaceId=$(cat /etc/omsagent-secret/WSID)
Expand All @@ -103,7 +103,7 @@ if [ $RET -eq 200 ]; then
else
echo "LA Onboarding:Workspace Id not mounted"
fi
else
else
echo "-e error Error resolving host during the onboarding request. Check the internet connectivity and/or network policy on the cluster"
fi

Expand Down Expand Up @@ -131,7 +131,7 @@ rm -f /etc/opt/microsoft/omsagent/conf/omsagent.d/omsconfig.consistencyinvoker.c
if [ -z $INT ]; then
if [ -a /etc/omsagent-secret/DOMAIN ]; then
/opt/microsoft/omsagent/bin/omsadmin.sh -w `cat /etc/omsagent-secret/WSID` -s `cat /etc/omsagent-secret/KEY` -d `cat /etc/omsagent-secret/DOMAIN`
elif [ -a /etc/omsagent-secret/WSID ]; then
elif [ -a /etc/omsagent-secret/WSID ]; then
/opt/microsoft/omsagent/bin/omsadmin.sh -w `cat /etc/omsagent-secret/WSID` -s `cat /etc/omsagent-secret/KEY`
elif [ -a /run/secrets/DOMAIN ]; then
/opt/microsoft/omsagent/bin/omsadmin.sh -w `cat /run/secrets/WSID` -s `cat /run/secrets/KEY` -d `cat /run/secrets/DOMAIN`
Expand Down Expand Up @@ -159,7 +159,7 @@ service cron start

#get omsagent and docker-provider versions
dpkg -l | grep omsagent | awk '{print $2 " " $3}'
dpkg -l | grep docker-cimprov | awk '{print $2 " " $3}'
dpkg -l | grep docker-cimprov | awk '{print $2 " " $3}'

#telegraf & fluentbit requirements
if [ ! -e "/etc/config/kube.conf" ]; then
Expand Down Expand Up @@ -272,7 +272,7 @@ fi
/opt/telegraf --version
dpkg -l | grep td-agent-bit | awk '{print $2 " " $3}'

#dpkg -l | grep telegraf | awk '{print $2 " " $3}'
#dpkg -l | grep telegraf | awk '{print $2 " " $3}'

shutdown() {
/opt/microsoft/omsagent/bin/service_control stop
Expand Down
2 changes: 1 addition & 1 deletion ci_feature_prod/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ wget https://github.com/Microsoft/OMS-Agent-for-Linux/releases/download/OMSAgent
#create file to disable omi service startup script
touch /etc/.omi_disable_service_control

wget https://github.com/microsoft/Docker-Provider/releases/download/6.0.0.0/docker-cimprov-6.0.0-0.universal.x86_64.sh
wget https://github.com/microsoft/Docker-Provider/releases/download/healthpreview06182019/docker-cimprov-6.0.0-1.universal.x86_64.sh

chmod 775 $TMPDIR/*.sh

Expand Down
Loading