Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC usage metrics incorrect in /stats/summary kubelet endpoint for Windows Azure Files #110261

Closed
gracewehner opened this issue May 27, 2022 · 28 comments
Labels
area/cadvisor kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/windows Categorizes an issue or PR as relevant to SIG Windows.

Comments

@gracewehner
Copy link

What happened?

Curling https://<Node-IP>:10250/stats/summary for a Windows node returns the wrong usage and capacity bytes for a azurefile-csi PVC used by a Windows pod on that node:

 "pods": [
  {
   "podRef": {
    "name": "win-webserver-5d5d4966f5-zrcf6",
    "namespace": "default",
    "uid": "4b2a7fec-26d4-482c-adc8-2c25baa1d054"
   },
  ...
   "volume": [
    {
     "time": "2022-05-26T23:37:24Z",
     "availableBytes": 91617021952,
     "capacityBytes": 136912564224,
     "usedBytes": 45295542272,
     "inodesFree": 0,
     "inodes": 0,
     "inodesUsed": 0,
     "name": "volume",
     "pvcRef": {
      "name": "azure-file-win",
      "namespace": "default"
     }
    }
  },

This looks like it is the usage of the disk on the node or something else. azurefile-csi PVCs used by Linux pods report the correct usage.

What did you expect to happen?

The PVC is empty so the usedBytes should be 0 and the capacity should be 2Gi.

How can we reproduce it (as minimally and precisely as possible)?

Apply the following to create an azurefile PVC and a windows pod that uses it:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azure-file-win
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: azurefile
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: Service
metadata:
  name: win-webserver
  labels:
    app: win-webserver
spec:
  ports:
    # the port that this service should serve on
  - port: 80
    targetPort: 80
  selector:
    app: win-webserver
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  selector:
    matchLabels:
      app: win-webserver
  replicas: 1
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
      containers:
      - name: windowswebserver
        image: mcr.microsoft.com/windows/servercore:ltsc2019
        imagePullPolicy: IfNotPresent
        command:
        - powershell.exe
        - -command
        - $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
        volumeMounts:
        - mountPath: "C:\\Data"
          name: volume
      volumes:
      - name: volume
        persistentVolumeClaim:
          claimName: azure-file-win
      nodeSelector:
        beta.kubernetes.io/os: windows

Then run curl -s -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"https://<NODE_IP>:10250/stats/summary where <NODE_IP> is the IP of the Windows node the pod is running on and view the output for the pod.

Anything else we need to know?

No response

Kubernetes version

1.21.7

Cloud provider

Azure

OS version

Caption: Microsoft Windows Server 2019 Datacenter
Version: 10.0.17763
BuildNumber:17763
OSArchitecture: 64-bit

Install tools

No response

Container runtime (CRI) and version (if applicable)

Docker

Related plugins (CNI, CSI, ...) and versions (if applicable)

AzureFile CSI Driver version: 1.17.0

@gracewehner gracewehner added the kind/bug Categorizes issue or PR as related to a bug. label May 27, 2022
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 27, 2022
@k8s-ci-robot
Copy link
Contributor

@gracewehner: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 27, 2022
@gracewehner
Copy link
Author

/sig node storage windows
/area cadvisor

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/windows Categorizes an issue or PR as relevant to SIG Windows. area/cadvisor and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 27, 2022
@andyzhangx
Copy link
Member

I have checked this issue, this is actually related to this issue: kubernetes-csi/csi-proxy#208
There is no such good way to get usage of smb mount on Windows node per my knowledge.
/sig windows

@SergeyKanzhelev
Copy link
Member

/remove-sig node

since it looks to be a known windows issue

@k8s-ci-robot k8s-ci-robot removed the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 1, 2022
@marosset
Copy link
Contributor

marosset commented Jun 2, 2022

@dcantah @msscotb - do either you know if there is a way to get usage stats for SMB mounts on Windows?

@dcantah
Copy link
Member

dcantah commented Jun 3, 2022

Asked some SMB folks. In the meantime I've been lookin around and https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getdiskfreespaceexa may be an avenue. Believe it works on smb shares, I can report back tomorrow and push a change if this does actually do the job.

@dcantah
Copy link
Member

dcantah commented Jun 3, 2022

GetDiskFreeSpaceEx seems to work from testing on an az file share.

package main

import (
	"fmt"
	"log"
	"os"

	"golang.org/x/sys/windows"
)

func main() {
	var (
		total     uint64
		totalFree uint64
	)
	dirName := windows.StringToUTF16Ptr(os.Args[1])
	if err := windows.GetDiskFreeSpaceEx(dirName, nil, &total, &totalFree); err != nil {
		log.Fatal(err)
	}
	toGib := func(bytes uint64) int {
		return int(bytes >> 30)
	}
	fmt.Printf("Total: %d\nTotal Free: %d\n", toGib(total), toGib(totalFree))
}

The above reports:
Total: 5120
Total Free: 5100

with the share I was testing with having a 5TiB quota and a single 20 GiB file inside. I'll wait for the SMB folks to tell me if there's any gotchas, but if not I can push a change to add this to the csi-proxy

@andyzhangx
Copy link
Member

@dcantah what's the os.Args[1]? is it \\MyServer\MyShare\ or mounted directory name?

@dcantah
Copy link
Member

dcantah commented Jun 5, 2022

@andyzhangx \\myserver\myshare, are these mounts actually mapped to drive letters or just accessed via the UNC path?

@andyzhangx
Copy link
Member

@dcantah The k8s file driver would create a symlink to the azure file, like following:

C:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-f9b134da-b69d-46bf-9aaf-3787476ed2b1>dir
 Volume in drive C has no label.
 Volume Serial Number is AAFE-3BA6

 Directory of C:\var\lib\kubelet\plugins\kubernetes.io\csi\pv\pvc-f9b134da-b69d-46bf-9aaf-3787476ed2b1

06/05/2022  12:37 PM    <DIR>          .
06/05/2022  12:37 PM    <DIR>          ..
06/05/2022  12:37 PM    <SYMLINKD>     globalmount [\\xxx.file.core.windows.net\pvc-f9b134da-b69d-46bf-9aaf-3787476ed2b1\]
06/05/2022  12:37 PM               149 vol_data.json
               1 File(s)            149 bytes
               3 Dir(s)  108,333,318,144 bytes free

Would GetDiskFreeSpaceEx work for both symlink and UNC path?

@dcantah
Copy link
Member

dcantah commented Jun 5, 2022

@andyzhangx Looks like it:

PS C:\Users\danny\go\src\github.com\dcantah\freespace> .\freespace.exe C:\smbtest\                 
Total: 5120
Total Free: 5100

C:\smbtest is just a symlink to my az file share

@andyzhangx
Copy link
Member

@dcantah great, current (VolumeAPI) GetVolumeStats(volumeID string) accepts a volumeID which is only for disk, I think you could add a new API (VolumeAPI) GetVolumeStatsFromPath(path string) accepts a mounted dir, that should work for both disk and smb mounts in csi-proxy, wdyt?

https://github.com/kubernetes-csi/csi-proxy/blob/48ba409159027564c50926138dd951d106b53077/pkg/os/volume/api.go#L201-L225

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2022
@marosset
Copy link
Contributor

marosset commented Sep 6, 2022

/remove-lifecycle stale
@dcantah friendly ping on the above question :)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2022
@lualvare
Copy link

lualvare commented Nov 2, 2022

Hello @dcantah, hope you are doing well.

I am following up into this issue to understand if you were able to get this to work. If yes, could you please share the details here?

Thank you so much.

@andyzhangx
Copy link
Member

andyzhangx commented Nov 9, 2022

Hello @dcantah, hope you are doing well.

I am following up into this issue to understand if you were able to get this to work. If yes, could you please share the details here?

Thank you so much.

@lualvare this requires csi-proxy change and new csi-proxy version upgrade. And we are in the process of removing csi-proxy and use native call from k8s 1.23, after that csi-proxy removal change, it's easier to fix this issue in csi driver.

@marosset
Copy link
Contributor

marosset commented Nov 9, 2022

@kiashok FYI

@lualvare
Copy link

@andyzhangx thank you so very much for the information provided here, I want to know if this is already in the roadmap and see if there is any ETA for this fix?

Thank you so much.

@andyzhangx
Copy link
Member

@andyzhangx thank you so very much for the information provided here, I want to know if this is already in the roadmap and see if there is any ETA for this fix?

Thank you so much.

@lualvare it mainly depends on when csi-proxy removal would be completed: kubernetes-csi/csi-proxy#217, since we don't publish any new version of csi-proxy now, after that work is done, such kind of change is easier. There is no clear ETA yet.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 10, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2023
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx
Copy link
Member

/reopen

@k8s-ci-robot
Copy link
Contributor

@andyzhangx: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Apr 9, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2023
@andyzhangx
Copy link
Member

andyzhangx commented Aug 16, 2023

FYI. this issue would be fixed on AKS 1.27 with host process deployment directly, here is an example PR (by using GetDiskFreeSpaceEx) how to fix this issue: kubernetes-sigs/azurefile-csi-driver#1337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cadvisor kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/windows Categorizes an issue or PR as relevant to SIG Windows.
Projects
Status: Done
Development

No branches or pull requests

8 participants