-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileSystemWatcher may cause problems in containers - inotify limits and incorrect error message #27272
Comments
Just an FYI, our friends at Zeit figured out their server-side misconfiguration here https://github.com/zeit/now-examples/pull/61 but it would have been easier with the better error message. |
It looks like this issue as it applies to corefx is just to improve the error message? That seems quite reasonable, and easy to fix. |
Hello, I've just opened a code branch to upgrade to EF Core 2.2. My new code branch is failing all CI builds with the infamous "inotify" error:
We have about 3,500 unit tests, and it only fails at some point really far along in the tests. Build bots are on Ubuntu 18.04. I've read the discussion above, and all I can say is that it sounds relevant to my problem, but I have no idea how to fix it. Help, please? |
It sounds like either you're not disposing of all of the FileSystemWatchers you're creating, or something is causing tons of tests that create FileSystemWatchers to run concurrently. If your tests aren't themselves creating FileSystemWatchers, then it sounds like something in the environment is creating them, maybe something in EF Core 2.2, and it'd likely be worth an issue in the EF Core repo, assuming that's where they're coming from. |
@stephentoub I'm not explicitly creating any FileSystemWatchers. And my tests are running sequentially, not concurrently.
Line 48 of my
This, in turn, is being called by my integration test's base class, which creates a new |
Also, to clarify, it wasn't just EF Core I upgraded; I meant to say that I upgraded to .NET Core 2.2. |
@shaulbehr, are you able to attach a debugger to the process when it's in one of these states? e.g. if you could attach lldb and use sos, you could use @natemcmaster, @rynowak, I'm not sure who's responsible for this support used from MVC, but have you seen any issues related to PhysicalFilesWatcher instances not being disposed of in a timely manner? |
I don't think we've seen issues with disposal, but our file watchers tend to live the lifetime of the app. I think it was the case for a while that we didn't dispose file watchers so that could cause bugs if the app was stopped and started repeatedly in the same process. Note: that as of 2.2 we shouldn't be creating the filewatcher from that call stack anymore when the environment is set to /cc @pranavkm |
Look at that call stack again, this makes a little more sense now if these are integration tests. @shaulbehr are your tests creating a |
@rynowak Yes, each test fixture creates a new |
Oho, here's something I just noticed. I added some code to ensure that my
@rynowak here's your smoking gun pointing at |
@stephentoub I'm really a rookie at Linux. If you can give me step-by-step instructions how to attach lldb and use sos and dumpheap, I'm probably up to that. |
@rynowak, based on your question "are your tests creating a TestServer for each test?" and the answer of "Yes", it seems like you may have some insights here? |
One option would be to try and limit the number of TestServer instances you create. That might or might not be feasible given your requirements. If it's possible, I would expect creating fewer servers to speed up your test execution as well. Another thing you could try, would be to change how configuration is wired up and remove the file watching. The actual problem reported by that call stack is one that we've already fixed in 3.0 dotnet/extensions#928 |
@rynowak going through your suggestions:
|
Sorry for the delay on this, I've been out of the office. You should be able to call If you can show my how you're setting up |
I'm running into this with my client's on-prem K8S clusters. I've set the reloadOnChange false to my appsettings and the USE_POLLING flag true. Even using a vanilla, boilerplate WebAPI project I get this with enough consistency that we now have a script to "refresh" the IOException throwing pods. Where/when would Sources.Clear() be practicle? When I have an IConfigurationBuilder is in the CreateWebHostBuilder; I'm gonna need those configs later in Startup, I can't just clear them before I've gotten their data. |
@rynowak There are a lot of moving parts in this machine.
I could dump a bunch of other accompanying code here, but I don't want to overload you or anyone else with noise; better you should ask me more specific questions about which code snippets you'd like to see. Alternatively, if you'd like, you can message me privately and I can temporarily give you rights to our Git repo so you can see for yourself what the whole setup looks like. |
My issue seemed related to using the same user across all pods in the cluster. Pro-tip: running as root isn't a good idea for more than just security concerns. Now that each pod has their own user, my issue appears to be resolved. |
I have this issue with the following setup ..
I am using VSCode's Dev-Containers, however the issue occurs if I F5 Debug from VSCode OR if I run the app direct from container terminal - it ofcourse does not always happen but its easily enough to be a right pain the in bum, only solution I have found so far is to quit VSCode. My microservices are in early stages of development, I do run MVC but only with like 1 TestController, and I am using the standard
method which I believe performs a watch on appsettings.json and appsettings.development.json so I am stumped as to why every hour or so I have this issue occur, I actually get the feeling its somehow VSCode related (as quiting VSCode resolves the issue even with the Dev-Container still running in the background) My only other suspicion is my SPA microservice when running Debug is using Webpack hot module replacement, so it must be watching source files. However this issue also occurs on my Auth microservice (IdentityServer4) which is not Webpacking ofcourse Hah - as it turns out it just happened again, the following is my stack trace
Any help is much appreciated |
There's plenty of discussion here, reopening so it's visible. |
Hey all, specific to aspnetcore instances that are experiencing this exception, I have isolated a cause and steps to mitigate until a better solution comes along. Adding the fixes described below in a brand new project reduced the watchers used from around 93 to around 26. This only pertains to aspnetcore and I'll not rehash the other tools to mitigate the issue like tl;dr version: Taghelpers that use services.AddMvc()
.SetCompatibilityVersion(CompatibilityVersion.Version_2_2)
.AddRazorOptions(ro => {
ro.FileProviders.Clear();
ro.FileProviders.Add(new CompositeFileProvider(new[] {new NullFileProvider()}));
ro.AllowRecompilingViewsOnFileChange = false;
}); Without the above snippet, I noticed that there were about 54-58 watchers being used at startup. To get the number of watchers being used (+ ~4 watchers that are not used by dotnet), you can add the following to your Dockerfile in the Following that, we can either check via the terminal for the docker container by running If going the ShellHelper route, adding the following JsonResult to a controller, we can verify in the browser: [Route("/getfilewatches")]
public JsonResult GetFileWatches()
{
string maxUserWatches = System.IO.File.ReadAllText("/proc/sys/fs/inotify/max_user_watches").Trim();
string currentUsedInotifyWatches = "lsof | grep inotify | wc -l".Bash();
return new JsonResult(new {maxUserWatches, currentUsedInotifyWatches });
} With the asp-append-version properties, you'll see the watches jumping by about 40 on a page request that uses the YMMV but I hope that this will help anyone else that has been banging their head on their desk for months trying to solve for this issue. I will add a full repro if desired. |
@kylef000 thanks for your detailed insight. I have managed a work around for this which once setup works nicely, note this is not something I came up with but something I found somewhere else (I really should track where I found this) .. I put a shell script together with #!/bin/bash and run it after Docker fires up, this alters the Docker hosts inotify limit (which is a Linux VM on a real Mac host in my case), I dont know whether this limit will realistically be breached at any point, it has worked for many weeks now given how easy the initial issue was to reproduce. Cheers |
@stevef51 Thanks for the response. I've done the same locally. Unfortunately running the container in privileged mode is not advisable in a live environment (especially a multi-container docker environment), because it allows the container to interact with the host and other devices connected to the host in a way that may be abused. The docker daemon runs as root.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities |
@kylef000 Thanks again - By running the script that specific container does have privileged access to the host (in my case a Linux VM) which is needed to be able to alter the hosts inotify limits, however the container immediately stops and is discarded (actually just noticed I am missing the --rm flag to make this auto discard) the change to the host remains though. My service containers however run in normal access (none privileged) but they inherit the inotify limit of the host which is now increased, they dont run in privileged mode. Granted it is definitely a work around and I think either the Linux VM that Docker fires up should by default have a higher inotify limit and/or the ASP.NET Core FileWatchers need some attention to see why they are using so many of them. Cheers |
Triage: As this was fixed in dotnet/corefx#32462 (in 3.0), and the discussion is dying down, we are re-closing. If more discussion is needed, please open a new issue. |
The fix for the other part of this this is dotnet/extensions#928. |
I'm not convinced this is fixed. Isn't the issue actually here in PhysicalFilesWatcher: This class attempts to respect For context, I'm still getting the error:
albeit due to a mistake in my test parallelisation, but still shouldn't be creating any file system watchers when running on the sdk linux docker image. |
Re-opening for the Microsoft.Extensions issue... |
Will close again. #37664 can be used to track the Microsoft.Extensions issue. |
I'm having a very similar issue to this debugging a docker compose proj with vs 2019. |
MOVED FROM dotnet/aspnetcore#3475
Looking around the web I'm seeing years of issues with FileSystemWatcher saying "The configured user limit (n) on the number of inotify instances has been reached."
UPDATE: Looks like https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Linux.cs#L371 will assume when inotify_add_watch fails with an ENOSPEC it must but an issue with inotify instances being out of range. In fact, ENOSPEC can also mean "the kernel failed to allocate a needed resource." We had no way to know it was anything other than "too many files open." The error message is misleading.
Phrased differently. There's two Error Cases and we throw a message that implies there's just One.
This is becoming more prevalent in container situations in constrained sandboxes. I'm trying to deploy https://github.com/shanselman/superzeit (just clone and "now --public" or run locally with docker) to Zeit.co and I'm hitting this regularly. I don't think I'm hitting a limit. I think Zeit (and others) are blocking the syscall.
I think there are two issues here:
1 We should return a different error message if inotify_add_watch fails, and then circuit break so that FileSystemWatcher doesn't prevent the app from starting. If we CAN startup without a watch successfully, we should.
2 It seems DOTNET_USE_POLLING_FILE_WATCHER=1 is used in dotnet-watch and the aspnet file providers but the base System.IO FileSystemWatcher class doesn't support DOTNET_USE_POLLING_FILE_WATCHER? We should probably be consistent.
If I change reloadOnChange: false in Program.cs to bypass the first watch that is set on AppSettings.json, I end up hitting it later when Razor/MVC sets up its FileWatchers.
We need at a minimum, to have DOTNET_USE_POLLING_FILE_WATCHER respected everywhere. Another idea would be for a way to have FileSystemWatcher "fail gracefully." We need to test on systems with
Related issues?
@stephentoub @natemcmaster @muratg @pranavkm
The text was updated successfully, but these errors were encountered: