-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic I/O Errors Lead to Files Getting Sporadically Locked on azureFile
Volumes
#1593
Comments
azureFile
VolumesazureFile
Volumes
Today is pretty bad... just got this at the end of creating a small tgz file:
This is inside one of the Alpine containers. This is what followed:
Sure enough:
|
azureFile
VolumesazureFile
Volumes
@GuyPaddock is it possible to try |
@andyzhangx At the moment, the workloads are provided by a vendor rather than ones we built in-house. Is there a particular reason why Ubuntu 16.04 would handle this better? My understanding is that the SMB volumes are mounted on the host node rather than by the container itself. |
@GuyPaddock the issue I see based on the errors you pasted are the result of IOPS starvation/saturation - see issue #1373 - no such file or directory occurs from within the container when the node VM itself is victim to the OS disk or VM level IOPS throttle. |
@jnoller What is our next step if Azure Support is merely focused on "fixing" CoreDNS and is not working with us on the IOPS issue? I'm waiting to hear back from Azure Support after sending them my latest info, but so far they've only focused on adjusting CoreDNS to auto-scale for us. |
@GuyPaddock I would send them this issue/thread as well as issue #1373 via the ticket - you should also ask them to check for IOPS throttling on the VMs - this can be triggered by the OS disk, or the VM SKU limit. |
@jnoller Both were included in the original issue summary... |
I don't think it's related to IOPS, since using
cc @smfrench |
This is not tied to the dns issue you’re citing or the kernel revision. While you can re create similar failures you can clearly see the containers inability to read its own filesystem and socket. I can re create this using any Docker image or application regardless of the contents of the image OS. If the system’s iops issues are unresolved, even after applying the DNS fixes or other options in this thread, the system will still fail under load |
@AndyZhang I believe you are confusing this issue with my other issue --#1325 Issue 1325 is about the fact that under Alpine containers, folders mounted over SMB that have more than 64 files have issues. This issue (1593) is about the fact that a high volume of reads or writes to an SMB volume in AKS causes the volume to fail sporadically with I/O errors that leave files locked, regardless of the type of container (Alpine, CentOS, etc). |
Action required from @Azure/aks-pm |
Action required from @Azure/aks-pm |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This issue will now be closed because it hasn't had any activity for 15 days after stale. GuyPaddock feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion. |
What happened:
azureFile
volumes in AKS, mounted through manually-managed PV claims.Get-AzStorageFileHandle
andClose-AzStorageFileHandle
from a PowerShell terminal to locate and remove the file lock that has been created.cp
,unzip
,mv
, and7z
-- as far as I know, these tools do not normally create file locks.What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Not sure if all of the following steps are required, but here's what we are doing that eventually leads to this issue:
7z
) inside. For these repro steps, SFTP should ensure that there is a user account inside with auid
of33
and agid
of33
; in our infrastructure, this is needed for compatibility with the Nextcloud file application.azureFile
volume mounted via a Persistent Volume, with the specifiedmountOptions
:kubectl exec
to shell-in to the pod.7z x
.mkdir test
) and then usecp -Rv <extracted folder> test
(where<extracted folder>
represents the name of the folder created by unzipping in step 6) to copy the files recursively to the new folder on the file share.rm -rf
on both the folder that was originally extracted in step 6 and the new folder created in step 7.Anything else we need to know?:
kubectl version
):The text was updated successfully, but these errors were encountered: