-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Users sometimes not found #18
Comments
The second is from a container that previously worked, so needs fixing. |
The Dockerfile:
|
My 2-cents, in case not obvious: |
The user is indeed not created, which is a problem for switching here. We do some lookups to find what user (and group) to switch to as we always receive a string, and are not sure if this is a username or user id: rse-sagemaker-shim/sagemaker_shim/models.py Lines 292 to 302 in a9852c5
We then set the user info from this: rse-sagemaker-shim/sagemaker_shim/models.py Lines 318 to 345 in a9852c5
Before passing it to rse-sagemaker-shim/sagemaker_shim/models.py Lines 460 to 467 in a9852c5
Here the |
Docs for the disambiguation issue: https://www.gnu.org/software/coreutils/manual/coreutils.html#Disambiguating-names-and-IDs
Not sure what can be done about associated groups yet though... researching... |
Explanation about my incorrect conclusion from the previous post: It seems that newer linux-tools have discouraged the use of number as the first letter of linux usernames by generating an error. Though technically it seems to be still allowed! |
Main question for me - does |
As long as you are root this seems to work as expected. You cannot do it as a "lower user" though, for obvious reasons if you think about it...
|
Great! Then we just need to fix our logic, no problem. |
Okay I think I have the fix, please take a look. |
Is that correct? There it looks like two groups are being used, |
Hey James! Yes, you replicated what I did it seems, however, as you pointed out, we should probably still drop* the *: Which brings me to the following headache when thinking about the state-space here: if we want to do it "perfectly well" we likely should do a lookup into the system groups config to see if there should be assigned to the user with the given uid (if it exists) and set Though honestly I would only do this if we encounter any follow-up issues. I do not suspect the average deep-learning to go that much into the inner workings of Linux to get his project to work when working locally, so I suspect that all code we encounter in the wild will only need subprocess.check_call(["id"], user=21322, group=32321, extra_groups=[]) ... for now. |
Adds fixes for the SageMaker Training backend: - Metrics collection - Handle event being called twice - Timeouts - Sets minimum time limits as no shorter limit can be handled by SageMaker Training Upgrades SageMaker shim to fix to user determination issue (DIAGNijmegen/rse-sagemaker-shim#18)
Two occurrences:
File \"sagemaker_shim/models.py\", line 302, in _get_user_info
RuntimeError: User 1000 not found
RuntimeError: User 1001 not found
First one is from @pkcakeout, have requested the Dockerfile.
The text was updated successfully, but these errors were encountered: