Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when retrieving net connections #1294

Closed
sylvainduchesne opened this issue Jun 12, 2018 · 14 comments
Closed

Error when retrieving net connections #1294

sylvainduchesne opened this issue Jun 12, 2018 · 14 comments

Comments

@sylvainduchesne
Copy link
Contributor

This PR #880 causes an issue when connection count can vary during the probe.
I have tracked down the problem to the fact that the GIL is now being released when calling the windows api. For some reason, when the GIL is released, the windows api sometimes returns ERROR_NOT_ENOUGH_MEMORY. My only guess is that perhaps it doesn't like having socket changes during the call? It is hard to say for sure since the error is undocumented in the windows api.
In any case, calling
psutil.Process().connections(kind="tcp4")

can sometimes result in

  File "d:\open_source\psutil\psutil\__init__.py", line 1153, in connections
    return self._proc.connections(kind)
  File "d:\open_source\psutil\psutil\_pswindows.py", line 635, in wrapper
    return fun(self, *args, **kwargs)
  File "d:\open_source\psutil\psutil\_pswindows.py", line 917, in connections
    return net_connections(kind, _pid=self.pid)
  File "d:\open_source\psutil\psutil\_pswindows.py", line 325, in net_connections
    rawlist = cext.net_connections(_pid, families, types)
MemoryError

or

  File "d:\open_source\psutil\psutil\__init__.py", line 1153, in connections
    return self._proc.connections(kind)
  File "d:\open_source\psutil\psutil\_pswindows.py", line 635, in wrapper
    return fun(self, *args, **kwargs)
  File "d:\open_source\psutil\psutil\_pswindows.py", line 917, in connections
    return net_connections(kind, _pid=self.pid)
  File "d:\open_source\psutil\psutil\_pswindows.py", line 325, in net_connections
    rawlist = cext.net_connections(_pid, families, types)
WindowsError: [Error -1073741823] Windows Error 0xC0000001

I've attached the (somewhat messy) test to reproduce the issue.
test_psutil.py.txt

I understand the reasoning behind releasing the GIL in the loop (and you had mentioned it in the PR), so not sure how you wanted to handle this.

Thanks

@giampaolo
Copy link
Owner

Uhm... I cannot reproduce the issue by using your script.

@sylvainduchesne
Copy link
Contributor Author

hrm, thats a shame.
took me a while to determine the GIL (or the absence of) was at fault.
If this helps:
OS Name: Microsoft Windows 10 Entreprise
OS Version: 10.0.14393 N/A Build 14393
although, I don't think the issue is related the windows version, as we have various production servers that have the same issue.
But without fault, I get one of the 2 errors within seconds of launching the script. If I remove the GIL release, no more issue.
What do you suggest?

@giampaolo
Copy link
Owner

giampaolo commented Jun 16, 2018

What do you mean that the GIL is at fault?

As for the error you're seeing, it almost certainly originates from here:

*data = malloc(*size);
if (*data == NULL) {
error = ERROR_NOT_ENOUGH_MEMORY;
continue;
}

And this is where it's translated into a MemoryError exception:
error = __GetExtendedTcpTable(getExtendedTcpTable,
AF_INET, &table, &tableSize);
if (error == ERROR_NOT_ENOUGH_MEMORY) {
PyErr_NoMemory();
goto error;
}

Since it's malloc() which fails I suppose that means you run out of physical memory. How much RAM does your system have and how many connections were reported by psutil.net_connections(kind='all') before it failed? You can try something like this:

try:
    print(len(psutil.net_connections(kind='all')))
except MemoryError:
    print(psutil.virtual_memory())

To conduct test I recommend psutil.net_connections(kind='all') instead of psutil.Process().connections(kind="tcp4") (both use the same C function anyway).

@sylvainduchesne
Copy link
Contributor Author

Thanks for the reply.
Total Physical Memory: 65,434 MB
I tried with your suggestions, but the issue seems to be unrelated to actually memory usage. I had also suspected that prior to opening the issue, so I was monitoring the memory as the test was running, but memory usage remains quite low.
In any case, no MemoryError exception is raised (memory usage quite low). I decided to keep track of the max number of connections returned by the api before the error occurs, and it is also not very high, roughly 4000 connections.

As for the GIL, what I mean is that if I remove
https://github.com/giampaolo/psutil/blob/master/psutil/_psutil_windows.c#L1539
https://github.com/giampaolo/psutil/blob/master/psutil/_psutil_windows.c#L1556
the problem no longer occurs. So, keeping the GIL fixes the problem. I haven't tried, but I assume that even prior to this change, releasing the GIL would give similar results.

@giampaolo
Copy link
Owner

Do you have the possibility to put printf() statements in the C code and recompile with VS in order to understand where exactly the error originates from?
According to what you wrote we can incur into:

WindowsError: [Error -1073741823] Windows Error 0xC0000001
MemoryError

...both originating from here:

  File "d:\open_source\psutil\psutil\_pswindows.py", line 325, in net_connections
    rawlist = cext.net_connections(_pid, families, types)

@sylvainduchesne
Copy link
Contributor Author

Hi,
Sorry, I didn't get an email for your last comment. Yes I can definitely add printf() statements, so let me know where.
I did some debugging prior to opening the issue to make sure it was something we couldn't solve without modifying psutil. Basically, it boils down to calling GetExtendedUdpTable (or TCP equivalent) can sometimes fail with either error codes.

WindowsError: [Error -1073741823] Windows Error 0xC0000001
ERROR_NOT_ENOUGH_MEMORY

I could probably reproduce the issue without psutil, simply calling the windows api in a c extension would most likely yield the same results.
Bottomline: the errors don't occur when GIL is kept.

@giampaolo
Copy link
Owner

Thank you.

@sylvainduchesne
Copy link
Contributor Author

Thanks for merging :)
Any chance we can get a release? We are still on 4.x codebase

@giampaolo
Copy link
Owner

New release is out

@sylvainduchesne
Copy link
Contributor Author

thanks alot!

nlevitt added a commit to nlevitt/psutil that referenced this issue Apr 9, 2019
* origin/master: (182 commits)
  giampaolo#1394 / windows / process exe(): convert errno 0 into ERROR_ACCESS_DENIED; errno 0 occurs when the Python process runs in 'Virtual Secure Mode'
  pre-release
  fix win num_handles() test
  update readme
  fix giampaolo#1111: use a lock to make Process.oneshot() thread safe
  pdate HISTORY
  giampaolo#1373: different approach to oneshot() cache (pass Process instances around - which is faster)
  use PROCESS_QUERY_LIMITED_INFORMATION also for username()
  Linux: refactor _parse_stat_file() and return a dict instead of a list (+ maintainability)
  fix giampaolo#1357: do not expose Process' memory_maps() and io_counters() methods if not supported by the kernel
  giampaolo#1376 Windows: check if variable is NULL before free()ing it
  enforce lack of support for Win XP
  fix giampaolo#1370: improper usage of CloseHandle() may lead to override the original error code resulting in raising a wrong exception
  update HISTORY
  (Windows) use PROCESS_QUERY_LIMITED_INFORMATION access rights (giampaolo#1376)
  update HISTORY
  revert 5398c48; let's do it in a separate branch
  giampaolo#1111 make Process.oneshot() thread-safe
  sort HISTORY
  give CREDITS to @EccoTheFlintstone for giampaolo#1368
  fix ionice set not working on windows x64 due to LENGTH_MISMATCH  (giampaolo#1368)
  make flake8 happy
  give CREDITS to @amanusk for giampaolo#1369 / giampaolo#1352 and update doc
  Add CPU frequency support for FreeBSD (giampaolo#1369)
  giampaolo#1359: add test case for cpu_count(logical=False) against lscpu utility
  disable false positive mem test on travis + osx
  fix PEP8 style mistakes
  give credits to @koenkooi for giampaolo#1360
  Fix giampaolo#1354 [Linux] disk_io_counters() fails on Linux kernel 4.18+ (giampaolo#1360)
  giampaolo#1350: give credits to @amanusk
  FreeBSD adding temperature sensors (WIP) (giampaolo#1350)
  pre release
  sensors_temperatures() / linux: convert defaultdict to dict
  fix giampaolo#1004: Process.io_counters() may raise ValueError
  fix giampaolo#1307: [Linux] disk_partitions() does not honour PROCFS_PATH
  refactor hasattr() checks as global constants
  giampaolo#1197 / linux / cpu_freq(): parse /proc/cpuinfo in case /sys/devices/system/cpu fs is not available
  fix giampaolo#1277 / osx / virtual_memory: 'available' and 'used' memory were not calculated properly
  travis / osx: set py 3.6
  travis: disable pypy; se py 3.7 on osx
  skip test on PYPY + Travis
  fix travis
  fix giampaolo#715: do not print exception on import time in case cpu_times() fails.
  fix different travis failures
  give CREDITS for giampaolo#1320 to @truthbk
  [aix] improve compilation on AIX, better support for gcc/g++ + fix cpu metrics (giampaolo#1320)
  give credits to @alxchk for giampaolo#1346 (sunOS)
  Fix giampaolo#1346 (giampaolo#1347)
  giampaolo#1284, giampaolo#1345 - give credits to @amanusk
  Add parsing for /sys/class/thermal (giampaolo#1345)
  Fix decoding error in tests
  catch UnicodeEncodeError on print()
  use memory tolerance in occasionally failing test
  Fix random 0xC0000001 errors when querying for Connections (giampaolo#1335)
  Correct capitalization of PyPI (giampaolo#1337)
  giampaolo#1341: move open() utilities/wrappers in _common.py
  Refactored ps() function in test_posix (giampaolo#1341)
  fix giampaolo#1343: document Process.as_dict() attrs values
  giampaolo#1332 - update HISTORY
  make psutil_debug() aware of PSUTIL_DEBUG (giampaolo#1332)
  also include PYPY (or try to :P)
  travis: add python 3.7 build
  add download badge
  remove failing test assertions
  remove failing test
  make test more robust
  pre release
  pre release
  set version to 5.4.7
  OSX / SMC / sensors: revert giampaolo#1284 (giampaolo#1325)
  setup.py: add py 3.7
  fix giampaolo#1323: [Linux] sensors_temperatures() may fail with ValueError
  fix failing linux tests
  giampaolo#1321 add unit tests
  giampaolo#1321: refactoring
  make disk_io_counters more robust (giampaolo#1324)
  fix typo
  Fix DeprecationWarning: invalid escape sequence (giampaolo#1318)
  remove old test
  update is_storage_device() docstring
  fix giampaolo#1305 / disk_io_counters() / Linux: assume SECTOR_SIZE is a fixed 512
  giampaolo#1313 remove test which no longer makes sense
  disk_io_counters() - linux: mimic iostat behavior (giampaolo#1313)
  fix wrong reference link in doc
  disambiguate TESTFN for parallel testing
  fix giampaolo#1309: add STATUS_PARKED constant and fix STATUS_IDLE (both on linux)
  give CREDITS to @sylvainduchesne for giampaolo#1294
  retain GIL when querying connections table (giampaolo#1306)
  Update index.rst (giampaolo#1308)
  fix giampaolo#1279: catch and skip ENODEV in net_if_stat()
  appveyor: retire 3.5, add 3.7
  revert file renaming of macos files; get them back to 'osx' prefix
  winmake: add upload-wheels cmd
  Rename OSX to macOS (giampaolo#1298)
  apveyor: reset py 3.4 and remove 3.7 (not available yet)
  try to fix occasional children() failure on Win: https://ci.appveyor.com/project/giampaolo/psutil/build/job/je3qyldbb86ff66h
  appveyor: remove py 3.4 and add 3.7
  giampaolo#1284: give credits to @amanusk + some minor adjustments
  little refactoring
  Osx temps (giampaolo#1284)
  ...
@paatrofimov
Copy link

@sylvainduchesne @giampaolo
Have you found the reason, -1073741823 is happening? I actually catch same error code from GetExtendedTcpTable when trying to establish tcp connections and read tcp table in parallel.

@giampaolo
Copy link
Owner

Are you using latest 5.6.3? This issue was supposed to be fixed.

@paatrofimov
Copy link

Well, my question is more related to winapi. I actually do not use psping. The problem is that this issue is almost the only one that mentions error code -1073741823, so I wonder if you know what causes it to occur.

@paatrofimov
Copy link

I got reply here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants