Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os needs fix to detect CPUs or fall back to 1 #19022

Closed
trusktr opened this issue Feb 27, 2018 · 24 comments
Closed

os needs fix to detect CPUs or fall back to 1 #19022

trusktr opened this issue Feb 27, 2018 · 24 comments
Labels
feature request Issues that request new features to be added to Node.js. os Issues and PRs related to the os subsystem.

Comments

@trusktr
Copy link
Contributor

trusktr commented Feb 27, 2018

NPM fails when os.cpus() returns undefined. Maybe os.cpus() should output something better with sane fallback default values?

Just an idea. Not sure if it's preferable to expect other programs to check for undefined.

@devsnek
Copy link
Member

devsnek commented Feb 27, 2018

imo the sanest output for not being able to detect cpus is undefined. a computer with 30cpus could still fail to operate with os.cpus()

EDIT:
after thinking about this for a few min in my mind it might even be better to throw if os.cpus actually is unable to get cpu info. an empty array is another option but thats pretty misleading (not indicative of an error)

@bnoordhuis bnoordhuis added os Issues and PRs related to the os subsystem. feature request Issues that request new features to be added to Node.js. labels Feb 27, 2018
@bnoordhuis
Copy link
Member

Yeah, I could see us raising an exception but not fabricating data. The error message will mostly be a generic 'operation failed' error, though.

The linked issue is about Android. The reason it fails there is because /proc/stat isn't readable any more in recent releases. If anyone knows of a fix, please submit it to libuv; src/unix/linux-core.c is the relevant file.

@benjamingr
Copy link
Member

I think this needs to be fixed on NPM and not Node, if we can't detect the # of CPUs I'd rather not fabricate it. We should probably add docs.

@bnoordhuis is there an open issue? Can't we just read /sys/devices/system/cpu/ on Android? Alternatively, the NDK has a android_getCpuCount() method.

@bnoordhuis
Copy link
Member

No issue. Android is self-serve.

@benjamingr
Copy link
Member

There are two possible fixes (reading it from sys/devices/system/cpu) or using an NDK API and calling android_getCpuCount() directly - however I don't think I'm at capacity to build & test an Android specific fix.

@gireeshpunathil
Copy link
Member

how about a new api os.cpucount() ?

#include <stdio.h>
#include <unistd.h>

int main() {
  fprintf(stderr, "# of CPUs: %ld\n", sysconf( _SC_NPROCESSORS_ONLN));
}

seem to be fairly portable across UNIX, and does not have the overhead of file system access.

@bnoordhuis
Copy link
Member

How do you think sysconf( _SC_NPROCESSORS_ONLN) is implemented? :-)

@gireeshpunathil
Copy link
Member

aix:
17825890: 39059491: sysconfig(8, 0x2FF22A90, 192) = 0

linux:

open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
read(3, "0-7\n", 8192)                  = 4
close(3)                                = 0

ok - so you meant to say at least in Linux, it is all one and the same! thanks.

@davisjam
Copy link
Contributor

If Node throws for os.cpus(), would a variety of other APIs also need to throw?

I think documenting the "Couldn't detect -> undefined" behavior seems OK.

@gireeshpunathil
Copy link
Member

I guess the issue is with the overloaded usage of the API - the original and intended usage for obtaining CPU model, speed etc. and the derived usage of obtaining the CPU count via. the length of the resultant array. While the first one if complex and may not have support from all the platforms, second one is well covered, straight-forward and error free.

From an end user's point, os.cpus().length being interpreted as the cpu count makes sense, but its value being undefined does not, as there is at least one CPU on which this API was invoked!

So I suggest we have another API for cpucount alone that is reliable and deterministic.

@gibfahn
Copy link
Member

gibfahn commented Mar 19, 2018

Looking at the linked issue (rvagg/node-worker-farm@0b2349c) the fact that you have to get the length of the object is a bit of a pain.

I'd imagine a natural API to look like:

const cpus = os.cpus() || 1

Instead of which you have to do:

const cpus = (require('os').cpus() || { length: 1 }).length

which isn't exactly obvious. Given that getting the number of cpus is probably the largest use-case for this API, I think I'd be +1 on a separate api for it.

@vielhuber
Copy link

Some update on this? It would be good that cpus().length returns 1 if /proc/stat is not readable.

I opened some other related issues, because those libraries don't do a proper check if cpus() is undefined and fetch their length (which causes errors on limited shells):

vuejs/vue-cli#2110
webpack-contrib/uglifyjs-webpack-plugin#337

@joshuambg

This comment has been minimized.

@gireeshpunathil
Copy link
Member

/cc @nodejs/libuv - please review and advise. If you are good with the proposal in
#19022 (comment) I can try implementing that. On the other hand, working around os.cpus() seems possible, but hacky: its return value becomes overloaded with Object or Integer.

@ronkorving
Copy link
Contributor

ronkorving commented Aug 9, 2018

Why not throw? The problem is literally "we are unable to detect". There is no value for that except "error", which I thought exceptions were for :) Doesn't that make it way more manageable than trying to fit it into a return value or creating a dedicated API "because edge-case"? If some software needs to be able to run on Android (like NPM), it can catch the exception and deal with it the way they think is appropriate.

@gireeshpunathil
Copy link
Member

there are 2 things that users use this API for:

  • get a long list of CPUs and their attributes: direct, documented function (the os.cpus() method returns an array of objects containing information about each logical CPU core.)
  • get the count of CPUs: indirect, implied function

throwing is great, but it breaks the second usage categories. how do we help them? especially when we look at their intent - get the count to do some simple math in deciding work sharing etc.?

what do you mean by edge-case? the API's promise is to work in a platform independent manner. So the question is whether it works or not.

If we extend your argument (catch the exception and deal with it the way they think is appropriate) to all platforms and all APIs then behaviors become stochastic.

@ronkorving
Copy link
Contributor

@gireeshpunathil
The only way to help those people without having those people change any code is by returning 1, which I believe most people are not in favor of. Introducing a new API requires code change, just like throwing would.

The edge-case I'm talking about is cpus() returning undefined. That doesn't happen for the vast majority of users.

@gireeshpunathil
Copy link
Member

agree that code change is inevitable if this needs to be fixed - either in the user land or in node / libuv.

my point is that:

when an API is consumed heavily for its secondary use case (derive cpu count), while the API does not fulfill its primary use case (long list of CPUs) in edge-cases, it is worthwhile to implement the secondary use case as a separate API.

@ronkorving
Copy link
Contributor

@gireeshpunathil Wait, can the secondary use-case be reliably implemented on the problem cases (Android)? If so, I didn't get that from the earlier communication and I stand corrected. Then a separate API would make a lot more sense to me 👍 If it were to merely fallback to 1 in those cases, I don't see a purpose for its existence though.

@gireeshpunathil
Copy link
Member

@ronkorving - thanks

If so, I didn't get that from the earlier communication and I stand corrected.

agree, it is not evident whether this is feasible or not, and the discussion so far did not converge into a PoC work (unfortunately I don't have a system to build and test). The potential routes for implementation from the discussion are:

  • android_getCpuCount()
  • sysconf( _SC_NPROCESSORS_ONLN)
  • /proc/stat
  • /sys/devices/system/cpu/

It could be possible that one method depends on another internally, but unless someone tests the feasibility we cannot say in either way.

@trenttobler
Copy link

Having had to recently go through and try to work around this myself on a system on which this was working and broke only after a recent OS update (android, running termux -- very frustrating when you think you might have made changes that broke working code, rather than it being an OS update from a week or two earlier).

What about a simple solution? Allow the user to define a CPU COUNT environment variable (name TBD). This, at least, allows an easy workaround - the user can decide if 1 is appropriate, or can invest a little more effort to identify the hardware and provide the CPU count manually? For this specialized "abuse" of the os.cpus() function - to just get a cpu count, it feels like this would be a good compromise?

Trott added a commit to Trott/io.js that referenced this issue Nov 17, 2018
Document that `os.cpus()` can return `undefined` if information about
cores is not available. This can happen particularly on unsupported
platforms like Android.

Fixes: nodejs#19022
@Trott
Copy link
Member

Trott commented Nov 17, 2018

It seems unlikely that consensus on an API change or addition is going to happen any time soon. #24408 is a documentation fix.

@Trott
Copy link
Member

Trott commented Nov 17, 2018

Documentation fix closed because I believe the call to os.cpus() results in an error within os.cpus() itself where it tries to retrieve the length of undefined. I don't think there's a case where os.cpus() itself would return undefined. Rather, os.cpus() calls process.binding('os').getCPUs() and that returns undefined. os.cpus() subsequently tries to get the length of the value returned by getCPUs()...and that's where the error is thrown. As I wrote in the now-closed pull request, I think this is a case where we shrug and say, "Sorry, Android is unsupported. Patches are welcome, and we'd love to support it if someone can help us get it set up in our CI infrastructure."

I suppose we could change the code up to make the error a bit more friendly in this particular case, but we don't have a way to test such a change in our CI infrastructure because we don't have anything that would trigger the error. So we'd have to resort to monkey-patching process.binding('os').getCPUs() which isn't passing the smell test for me....

@Trott
Copy link
Member

Trott commented Nov 17, 2018

Ah! The relevant code was changed 3 days ago on the master branch! #24264 51cea61

Up until then, os.cpus() could indeed return undefined. But now it can't. One more reason not to document this for platforms that we can't test on in CI. Behavior can change and we wouldn't know it and then the docs would be wrong!

Given all this, I'm inclined to close this. We can't guarantee behavior we can't test. Feel free to re-open or comment if closing this seems like an egregiously wrong decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. os Issues and PRs related to the os subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.