This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Note : The agent version(s) below has dates (ciprod), which indicate the agent build dates (not release dates)
Version microsoft/oms:ciprod08072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08072020 (linux)
Version microsoft/oms:win-ciprod08072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod08072020 (windows)
- Collection of KubeState metrics for deployments and HPA
- Add the Proxy support for Windows agent
- Fix for ContainerState in ContainerInventory to handle Failed state and collection of environment variables for terminated and failed containers
- Change /spec to /metrics/cadvisor endpoint to collect node capacity metrics
- Disable Health Plugin by default and can enabled via configmap
- Pin version of jq to 1.5+dfsg-2
- Bug fix for showing node as 'not ready' when there is disk pressure
- oneagent integration (disabled by default)
- Add region check before sending alertable metrics to MDM
- Telemetry fix for agent telemetry for sov. clouds
Version microsoft/oms:ciprod07152020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07152020 (linux)
Version microsoft/oms:win-ciprod05262020-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Following hotfixes which are applicable only for Linux agent
- Fix the issue related to collection of multi-containers in pod for the ContainerInventory table
- Fix the containerhostname field value to have podname rather than nodename in ContainerInventory table
- Fix OOM issue during container startup if there are high number of pods or containers on the node
- Fix the ContainerName field value same as before in ContainerInventory table
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod05262020-2) before this hotfix
Version microsoft/oms:ciprod06302020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06302020 (linux)
Version microsoft/oms:win-ciprod05262020-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Hotfix for nested JSON log parsing bug (applicable only to Linux Daemonset)
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod05262020-2) before this hotfix
Version microsoft/oms:win-ciprod05262020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Update application insights instrumentation key for windows image to point to the production instance
Version microsoft/oms:ciprod05222020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05222020 (linux)
Version microsoft/oms:win-ciprod05222020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05222020 (windows)
- Windows Daemonset - Collection of Windows std/stderr logs
- More Alerable Metrics (going to Metrics Store/custom metrics - see Customer Impact section below for metrics list)
- Fix OOM-ing at high prometheus scrape volume
- Update fluentbit (0.14.4 to 1.4.2)
- Drop non-numeric metrics thru Telegraf
- Reduce Health exception (when API server response is nil)
- Add 'Computer' dimension to all telemetry (internal use)
- Support for specifiying HTTP & HTTPS Proxy for outbound/egress (applicable only for non-AKS clusters)
- Move to rbac.authorization.k8s.io/v1 for ClusterRole & ClusterRoleBinding
- Move to apiextensions.k8s.io/v1 for Health CRD
- Windows Logs - Customers will see agent automatically start collecting windows container STDOUT/STDERR logs sending them to same loganaytics workspace (containerlogs table)
- Alertable metrics - Customers will see the below metrics & namespaces in 'Metrics' TOC for AKS clusters
- Metrics
- diskUsagePercentage
- completedJobsCount
- oomKilledContainerCount
- podReadyPercentage
- restartingContainerCount
- cpuExceededPercentage
- memoryRssExceededPercentage
- memoryWorkingSetExceededPercentage
- Metric Namespaces
- insights.container/containers
- Metrics
- HTTP/S Proxy support - For non-AKS clusters, proxy can be configured when installing thru HELM. Please see documentation for more details
Note: This agent release targetted ONLY for non-AKS clusters via Azure Monitor for containers HELM chart update
Version microsoft/oms:ciprod04162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04162020
- Add support for rate limiting
- Add support for Container Runtime Interface compatible container runtime(s) like CRI-O and ContainerD
- cAdvisor APIs are used to collect the container inventory for Docker/Moby and CRI runtime K8s environments
- Based on the container runtime, corresponding container log FluentBit parser(docker/cri) selected
- Ingestion will throttle the workspaces if the agent on the cluster sending the beyond Log Analytics Workspace throttling limits i.e. 500 MB/s
- On Docker runtime environments, Inventory of the containers obtained earlier via Docker REST API. Agent now uses the cAdvisor APIs to get the inventory of the containers for Docker and non-Docker container runtime environments.
Version microsoft/oms:ciprod03022020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03022020
- Collection of GPU metrics as InsightsMetrics
- Enable config map settings to enable collection of 'Normal' kube events
- Fix kubehealth exceptions to handle empty/nil kube api responses
- Get resource limits for health and MDM from kubelet instead of kube api
- Bug fix for windows node image collection where image name contains multiple slashes
- Exclude ARO master node for data collection
- Telemetry for kube events flushed
- Changes to support msi for mdm if service principal doesnt exist
- Changes for AKS telemetry to ping ods endpoint first and then network check
- KubeEvents bug fix for KubeEvent type
- Providing capability for customers to collect 'Normal' kube events using config map
- Metrics for GPU are collected and ingested to customers workspace if they have GPU enabled nodes
- Bug fix for windows container image collection allows customers to get the right data in the ContainerInventory table for windows containers.
Version microsoft/oms:ciprod01072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01072020
- Switch between 10255(old) and 10250(new) ports for cadvisor for older and newer versions of kubernetes
- Node cpu, node memory, container cpu and container memory metrics were obtained earlier by querying kubelet readonly port(http://$NODE_IP:10255). Agent now supports getting these metrics from kubelet port(https://$NODE_IP:10250) as well. During the agent startup, it checks for connectivity to kubelet port(https://$NODE_IP:10250), and if it fails the metrics source is defaulted to readonly port(http://$NODE_IP:10255).
Version microsoft/oms:ciprod12042019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod12042019
- Fix scheduler for all input plugins
- Fix liveness probe
- Reduce chunk sizes for all fluentD buffers to support larger clusters (nodes & pods)
- Chunk Kubernetes API calls (pods,nodes,events)
- Use HTTP.start instead of HTTP.new
- Merge KubePerf into KubePods & KubeNodes
- Merge KubeServices into KubePod
- Use stream based yajl for JSON parsing
- Health - Query only kube-system pods
- Health - Use keep_if instead of select
- Container log enrichment (turned OFF by default for ContainerName & ContainerImage)
- Application Insights Telemetry - Async
- Fix metricTime to be batch time for all metric input plugins
- Close socket connections properly for DockerAPIClient
- Fix top un handled exceptions in Kubernetes API Client and pod inventory
- Fix retries, wait between retries, chunk size, thread counts to be consistent for all FluentD workflows
- Back-off for containerlog enrichment K8S API calls
- Add new regions (3) for Azure Monitor Custom metrics
- Increase the cpu(1 core) & memory(750Mi) limits for replica-set to support larger clusters (nodes & pods)
- Move to Ubuntu 18.04 LTS
- Support for Kubernetes 1.16
- Use ifconfig for detecting network connectivity issues
- Collect eventType != Normal
Version microsoft/oms:ciprod10112019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10112019
- Update prometheus config scraping capability to restrict collecting metrics from pods in specific namespaces.
- Feature to send custom configuration/prometheus scrape errors to KubeMonAgentEvents table in customer's workspace.
- Bug fix to collect data for init containers for Container Logs, KubePodInventory and Perf.
- Bug fix for empty array being a valid setting in custom config in configmap.
- Restrict kubelet_docker_operations and kubelet_docker_operations_errors to create_containers, remove_containers and pull_image operations.
- Fix top exceptions in telemetry
Version microsoft/oms:ciprod08222019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08222019
- Cluster Health Private Preview based on config map setting
- Update resource requests for replicaset to 110m and 250Mi
- Update custom metrics supported regions
- Fix for promethus config map telemetry
- Telemetry for controller kind
- Update url to use one of the whitelisted urls for cp monitor telemetry
- Configmap with clusterid for AKS to be used by Application Insights
Version microsoft/oms:ciprod07092019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07092019
- Prometheus custom metric collection using config map allowing omsagent to
- Scrape metrics from user defined urls
- Scrape kubernetes pods with prometheus annotations
- Scrape metrics from kubernetes services
- Exception fixes in daemonset and replicaset
- Container Inventory plugin changes to get image id from the repo digest and populate repository for image with only image digest
- Remove telegraf errors from being sent to ApplicationInsights and instead log it to stderr to provide visibility for customers
- Bug fixes for region names with spaces being processed incorrectly while sending mdm metrics
- Add log size in telemetry
- Remove buffer chunk size and buffer max size from fluentbit configuration
Version microsoft/oms:ciprod06142019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06142019
- MDM pod metrics bug fixes - MDM rejecting pod metrics due to nodename or controllername dimensions being empty
- Prometheus metrics collection by default in every node for kubelet docker operations and kubelet docker operation errors
- Telegraf metric collection for diskio and networkio metrics
- Agent Configuration/ Settings for data collection
- Cluster level log collection enable/disable option
- Ability to enable/disable stdout and/or stderr logs collection per namespace
- Cluster level environment variable collection enable/disable option
- Config file version & config schema version
- Pod annotation for supported config schema version(s)
- Log collection optimization/tuning for better performance
- Derive k8s namespaces from log file name (instead of making call to k8s api service)
- Do not tail log files for containers in the excluded namespace list (if excluded both in stdout & stderr)
- Limit buffer size to 1M and flush logs more frequently [every 10 secs (instead of 30 secs)]
- Tuning of several other fluent bit settings
- Increase requests
- Replica set memory request by 75M (100M to 175M)
- Daemonset CPU request by 25m (50m to 75m)
- Will be pushing image only to MCR ( no more Docker) starting this release. AKS-engine will also start to pull our agent image from MCR
Version microsoft/oms:ciprod043232019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04232019
- Windows node monitoring (metrics & inventory)
- Telegraf integration (Telegraf metrics to LogAnalytics)
- Node Disk usage metrics (used, free, used%) as InsightsMetrics
- Resource stamping for all types (inventory, metrics (perf), metrics (InsightsMetrics), logs) [Applicable only for AKS clusters]
- Upped daemonset memory request (not limit) from 150Mi to 225 Mi
- Added liveness probe for fluentbit
- Fix for MDM filter plugin when kubeapi returns non-200 response
- Fix for closing response.Body in outoms
- Update Mem_Buf_Limit to 5m for fluentbit
- Tail only files that were modified since 5 minutes
- Remove some unwanted logs that are chatty in outoms
- Fix for MDM disablement for AKS-Engine
- Fix for Pod count metric (same as container count) in MDM
- Container logs enrichment optimization
- Get container meta data only for containers in current node (vs cluster before)
- Update fluent bit 0.13.7 => 0.14.4
- This fixes the escaping issue in the container logs
- Mooncake cloud support for agent (AKS only)
- Ability to disable agent telemetry
- Ability to onboard and ingest to mooncake cloud
- Add & populate 'ContainerStatusReason' column to KubePodInventory
- Alertable (custom) metrics (to AzureMonitor - only for AKS clusters)
- Cpuusagenanocores & % metric
- MemoryWorkingsetBytes & % metric
- MemoryRssBytes & % metric
- Podcount by node, phase & namespace metric
- Nodecount metric
- ContainerNodeInventory_CL to fixed type
- Omsagent - 1.8.1.256 (nov 2018 release)
- Persist fluentbit state between container restarts
- Populate 'TimeOfCommand' for agent ingest time for container logs
- Get node cpu usage from cpuusagenanoseconds (and convert to cpuusgaenanocores)
- Container Node Inventory - move to fluentD from OMI
- Mount docker.sock (Daemon set) as /var/run/host
- Add omsagent user to docker group
- Move to fixed type for kubeevents & kubeservices
- Disable collecting ENV for our oms agent container (daemonset & replicaset)
- Disable container inventory collection for 'sandbox' containers & non kubernetes managed containers
- Agent telemetry - ContainerLogsAgentSideLatencyMs
- Agent telemetry - PodCount
- Agent telemetry - ControllerCount
- Agent telemetry - K8S Version
- Agent telemetry - NodeCoreCapacity
- Agent telemetry - NodeMemoryCapacity
- Agent telemetry - KubeEvents (exceptions)
- Agent telemetry - Kubenodes (exceptions)
- Agent telemetry - kubepods (exceptions)
- Agent telemetry - kubeservices (exceptions)
- Agent telemetry - Daemonset , Replicaset as dimensions (bug fix)
- Disable Container Image inventory workflow
- Kube_Events memory leak fix for replica-set
- Timeout (30 secs) for outOMS
- Reduce critical lock duration for quicker log processing (for log enrichment)
- Disable OMI based Container Inventory workflow to fluentD based Container Inventory
- Moby support for the new Container Inventory workflow
- Ability to disable environment variables collection by individual container
- Bugfix - No inventory data due to container status(es) not available
- Agent telemetry cpu usage & memory usage (for DaemonSet and ReplicaSet)
- Agent telemetry - log generation rate
- Agent telemetry - container count per node
- Agent telemetry - collect container logs from agent (DaemonSet and ReplicaSet) as AI trace
- Agent telemetry - errors/exceptions for Container Inventory workflow
- Agent telemetry - Container Inventory Heartbeat
- Fix for containerID being 00000-00000-00000
- Move from fluentD to fluentbit for container log collection
- Seg fault fixes in json parsing for container inventory & container image inventory
- Telemetry enablement
- Remove ContainerPerf, ContainerServiceLog, ContainerProcess fluentd-->OMI workflows
- Update log level for all fluentD based workflows
- Changes for node lost scenario (roll-up pod & container statuses as Unknown)
- Discover unscheduled pods
- KubeNodeInventory - delimit multiple true node conditions for node status
- UTF Encoding support for container logs
- Container environment variable truncated to 200K
- Handle json parsing errors for OMI provider for docker
- Test mode enablement for ACS-engine testing
- Latest OMS agent (1.6.0-163)
- Latest OMI (1.4.2.5)
- Remove node-0 dependency
- Remove passing WSID & Key as environment variables and pass them as kubernetes secret (for non-AKS; we already pass them as secret for AKS)
- Please note that if you are manually deploying thru yaml you need to -
- Provide workspaceid & key as base64 encoded strings with in double quotes (.yaml has comments to do so as well)
- Provide cluster name twice (for each container – daemonset & replicaset)
- Kubernetes RBAC enablement
- Latest released omsagent (1.6.0-42)
- Bug fix so that we do not collect kube-system namespace container logs when kube api calls fail occasionally (Bug #215107)
- .yaml changes (for RBAC)