-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NodeGetVolumeStats #238
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Welcome @kbasv! |
Hi @kbasv. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I signed it |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approach is sound, some minor comments and questions we can discuss
cmd/main.go
Outdated
@@ -32,6 +32,8 @@ func main() { | |||
version = flag.Bool("version", false, "Print the version and exit") | |||
efsUtilsCfgDirPath = flag.String("efs-utils-config-dir-path", "/etc/amazon/efs/", "The path to efs-utils config directory") | |||
efsUtilsStaticFilesPath = flag.String("efs-utils-static-files-path", "/etc/amazon/efs-static-files/", "The path to efs-utils static files directory") | |||
volMetricsOptIn = flag.Bool("volMetricsOptIn", false, "Opt in to emit volume metrics") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be dash separated instead of camel case to be consistent w/ other args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll update this to reflect other args
cmd/main.go
Outdated
@@ -45,7 +47,8 @@ func main() { | |||
os.Exit(0) | |||
} | |||
|
|||
drv := driver.NewDriver(*endpoint, *efsUtilsCfgDirPath, *efsUtilsStaticFilesPath) | |||
drv := driver.NewDriver(*endpoint, *efsUtilsCfgDirPath, *efsUtilsStaticFilesPath, *volMetricsOptIn, *volMetricsRefreshRate) | |||
drv.SetNodeCapOptInFeatures(*volMetricsOptIn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y not hide this in NewDriver since NewDriver takes in volMetricsOptIn anyway? not sure the benefit of a separate function
pkg/driver/node.go
Outdated
@@ -187,6 +186,11 @@ func (d *Driver) NodeUnpublishVolume(ctx context.Context, req *csi.NodeUnpublish | |||
return nil, status.Error(codes.InvalidArgument, "Target path not provided") | |||
} | |||
|
|||
if d.volMetricsOptIn { | |||
klog.V(4).Infof("Evicting vol ID: %v, vol path : %v from cache", req.VolumeId, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible to publish the same VolumeId to different targets.
I don't want to complicate this too much, but do we need some kind of ref counter to account for that so we only evict from cache if counter == 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think counter will be a bit complex here. Do you know if a volume Id and volume path combination is unique? If yes, I can use it as the cache key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be unique but if we do that then we will be recalculating the same volume multiple times. Like if I have a volume fs-abcd:/root mounted to both /pod1/target and /pod2/target, the routine will calculate the disk usage of /root in fs-abcd twice, once for /pod1/target and again for /pod2/target which is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see what you mean with the counter, sure we can add a counter to evict only when all other volumeIds are unpublished.
pkg/driver/vol_statter.go
Outdated
} | ||
|
||
volUsed, ok := used.AsInt64() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, would like to see whitespace/newlines tightened up like here. Doesn't need to be a newline between the call and error checking. gofmt may help: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_aws-efs-csi-driver/238/pull-aws-efs-csi-driver-verify/1293265088089690114#1:build-log.txt%3A278
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will update this
func (v VolStatterImpl) computeDiskUsage(volId, volPath string) { | ||
klog.V(5).Infof("Compute Volume Metrics invoked for Vol ID: %v", volId) | ||
|
||
used, err := fs.DiskUsage(volPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if Unpublish is called on this volume in the middle of this routine? Would DiskUsage block the Unpublish unmount for the duration it's reading the volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs from my testing indicate UnPublish will not succeed when DiskUsage is running. I see "device is busy" associated with Unpublish command in the logs. Do you think it's okay to wait it out or find a way to force unpublish in the above scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Darn, well I think it's okay to wait it out. We can leave it as a future improvement because we're relying on a library (fs.DiskUsage) and there's no way to cancel it.
In the worst-case, Unpublish could be delayed indefinitely if volume metrics period < computeDiskUsage execution time
and computeDiskUsage ends up running 100% of the time.
In a less-bad case, Unpublish could be delayed by time it takes for DiskUsage to complete + 2 minutes
. The 2 minutes comes from the exponential back off for Unpublish retries. https://github.com/kubernetes/kubernetes/blob/323f34858de18b862d43c40b2cced65ad8e24052/pkg/util/goroutinemap/exponentialbackoff/exponential_backoff.go#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a TODO to figure out a way to cancel running DU process in the future
cmd/main.go
Outdated
@@ -32,6 +32,8 @@ func main() { | |||
version = flag.Bool("version", false, "Print the version and exit") | |||
efsUtilsCfgDirPath = flag.String("efs-utils-config-dir-path", "/etc/amazon/efs/", "The path to efs-utils config directory") | |||
efsUtilsStaticFilesPath = flag.String("efs-utils-static-files-path", "/etc/amazon/efs-static-files/", "The path to efs-utils static files directory") | |||
volMetricsOptIn = flag.Bool("volMetricsOptIn", false, "Opt in to emit volume metrics") | |||
volMetricsRefreshRate = flag.Float64("volMetricsRefreshRate", 5, "Refresh rate for volume metrics in minutes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be renamed to period instead of rate
7fba2b8
to
0df1a23
Compare
/test pull-aws-efs-csi-driver-e2e |
/test pull-aws-efs-csi-driver-e2e |
Generally, the code here makes sense to me, however the implementation seems a bit problematic. If I understand correctly, the underlying tool ( |
You're right on all counts. This PR has an optimization to avoid reading the same volumehandle (fs-abcd:/a/b) concurrently on the same node but there's certainly room for more. If we think that the performance penalty of enabling this feature will outweigh the benefit more often than not, we can do more optimizations first. Personally I'm inclined to merge and collect feedback, but I don't want people to enable the feature and then be surprised if their credits get exhausted, so I guess we ought to test & evaluate the impact more first. Some brainstorming:
|
I agree with concerns on the underlying implementation's potential to consume all the available IOPS in a File System.
The above two options will address concerns for mounts at a per node basis. However, it will not address concerns of a file system mounted on multiple nodes. For a multi node mount, given how Daemonsets work, it is not easy to share states between nodes and any solution for multi node mounts seems out of scope for the current implementation. |
A start jitter and rate limiter could be helpful. The test uncovered roughly what I would expect - that |
volUsageCache[volId] = volMetrics | ||
delete(volStatterJobTracker, volId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map updates from this function are data races since the function is always called in a goroutine. The check in launchVolStatsRoutine
to see if the volId
is currently being processed is not sufficient - the check could be called at the same time as the delete here. I would suggest wrapping them in a sync mechanism. Removing the time.Sleep(waitTime)
and stubbing out the external calls to the fs
library will expose the data races in the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be okay since launchVolStatsRoutine
will be invoked by kubelet approximately every minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say if it's invoked infrequently, then the overhead of a lock would be negligible. What's the recovery path if this panics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Didn't realize go panics on concurrent reads and writes. Alternative would be to either use locks
or sync.map
. I guess locks
are better given how kubelet works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, they are not safe for concurrent use and will panic under certain conditions. I'm not overly familiar with sync.map
but I believe it was created for optimization under certain conditions when regular locks are not meeting your performance needs.
@kbasv I ran your updated code and the changes fixed the data races in volStatter. The race detector still shows a race where the test code itself does not wrap the map updates in a lock, but the rest looks good, thanks. |
@kerbyhughes Thanks for checking, fixed the test code with locks. |
/retest |
3 similar comments
/retest |
/retest |
/retest |
} | ||
|
||
func (v VolStatterImpl) computeVolumeMetrics(volId, volPath string, refreshRate float64, fsRateLimit int) (*volMetrics, error) { | ||
if value, ok := v.retrieveFromCache(volId); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I have two PVs mounted on this node with the same volume ID but different target path. Say volume A fs-123:/a and volume B fs-123:/b. It will be random which metric I get back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not random, but one of A or B will be reported for both A and B.
However if you index the results cache by volume ID + path but the tracker just by volume ID, there might be situations where the metric for B is never computed. Let's say kubelet always calls get stats on A before B, then B will never get computed since the computation for A was just started and is in progress...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, volumeId from above consists of file system ID + subpath. So fs-123:/a
A and fs-123:/b
B will be two different volumeIDs and metric will be computed for both A and B.
The cache will ensure that a metric for volumeId fs-123:/a
mounted separately as two volumes A & B will not be computed twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense, i see now you distinguish between fsid
for fs rate limit and volid
for metrics cache
@wongma7 Let me know if you have more concerns on this PR. Otherwise can you please merge this? Thanks! |
Please squash the commits and check coveralls. The bot will not merge if coveralls says coverage decreased /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kbasv, wongma7 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Enable GET_VOLUME_STATS node capability
/retest |
}, | ||
} | ||
) | ||
makeDir(validPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test environment doesn't like this, not sure if https://golang.org/pkg/io/ioutil/#TempDir will work either
@wongma7 Looks like the coveralls test environment's temp directory creation is being flaky and that is pulling down the coverage. When running the test locally, temp directory creation works locally and the overall coverage is greater than 82%. The test environment succeeded in creating temp directory once here - https://coveralls.io/builds/32722046 |
@kbasv OK thanks for looking into it. Since the test works locally and in pull-aws-efs-csi-driver-unit (actually the output looks crazy because I see efs_watch_dog.go:236] stopping... being spammed infinitely but that's unrelated...), and technically coveralls is not supposed to block merge, I am inclined to merge this after the rest of the tests pass and leave the fix for coveralls as a followup /lgtm |
/retest |
Manually merging due to coveralls being flaky. I'll open an issue to track it |
Is this a bug fix or adding new feature?
New Feature
What is this PR about? / Why do we need it?
du
under the hood and enables GET_VOLUME_STATS node capability.du
sincedu
can take longer than the default kubelet timeout for NodeGetVolumeStats rpc.du
is invoked.What testing is done?
Added Unit tests to confirm the behavior.
Tested the implementation E2E with EKS on EC2.