Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System cannot find the path specified #58

Open
BStoller opened this issue Apr 20, 2023 · 8 comments
Open

System cannot find the path specified #58

BStoller opened this issue Apr 20, 2023 · 8 comments

Comments

@BStoller
Copy link

Describe the bug
I'm having an issue when trying to start up a Lang chain llm. After setting up the cluster

gpu = rh.cluster('test', instance_type='T4:1', use_spot=False)

I attempt to create the llm that will run my inferences

from langchain.llms import SelfHostedHuggingFaceLLM

llm = SelfHostedHuggingFaceLLM(model_id='dolly-v2-2-8b', hardware=gpu, model_reqs=['pip:./', 'transformers', 'torch'])

My code appears to run into some error with creating / finding a file. Hoping you all would be able to support.

INFO | 2023-04-20 11:38:47,871 | Setting up Function on cluster.
INFO | 2023-04-20 11:38:47,884 | Upping the cluster test
I 04-20 11:38:53 optimizer.py:617] == Optimizer ==
I 04-20 11:38:53 optimizer.py:628] Target: minimizing cost
I 04-20 11:38:53 optimizer.py:640] Estimated cost: $0.5 / hour
I 04-20 11:38:53 optimizer.py:640] 
I 04-20 11:38:53 optimizer.py:712] Considered resources (1 node):
I 04-20 11:38:53 optimizer.py:760] ---------------------------------------------------------------------------------------------------
I 04-20 11:38:53 optimizer.py:760]  CLOUD   INSTANCE               vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 04-20 11:38:53 optimizer.py:760] ---------------------------------------------------------------------------------------------------
I 04-20 11:38:53 optimizer.py:760]  Azure   Standard_NC4as_T4_v3   4       28        T4:1           eastus        0.53          ✔     
I 04-20 11:38:53 optimizer.py:760] ---------------------------------------------------------------------------------------------------
I 04-20 11:38:53 optimizer.py:760] 
I 04-20 11:38:53 optimizer.py:775] Multiple Azure instances satisfy T4:1. The cheapest Azure(Standard_NC4as_T4_v3, {'T4': 1}) is considered among:
I 04-20 11:38:53 optimizer.py:775] ['Standard_NC4as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC16as_T4_v3'].
I 04-20 11:38:53 optimizer.py:775] 
I 04-20 11:38:53 optimizer.py:781] To list more details, run 'sky show-gpus T4'.
I 04-20 11:38:53 cloud_vm_ray_backend.py:3327] Creating a new cluster: "test" [1x Azure(Standard_NC4as_T4_v3, {'T4': 1})].
I 04-20 11:38:53 cloud_vm_ray_backend.py:3327] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 04-20 11:38:58 cloud_vm_ray_backend.py:1156] To view detailed progress: tail -n100 -f [C:\Users\stollbak/sky_logs\sky-2023-04-20-11-38-53-125409\provision.log](file:///C:/Users/stollbak/sky_logs/sky-2023-04-20-11-38-53-125409/provision.log)
Output exceeds the [size limit](command:workbench.action.openSettings?%5B%22notebook.output.textLineLimit%22%5D). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?782f5ea9-ab7f-4618-adf1-dfd0a80d4ddb)---------------------------------------------------------------------------
ScannerError                              Traceback (most recent call last)
File [c:\Python310\lib\site-packages\sky\execution.py:266](file:///C:/Python310/lib/site-packages/sky/execution.py:266), in _execute(entrypoint, dryrun, down, stream_logs, handle, backend, retry_until_up, optimize_target, stages, cluster_name, detach_setup, detach_run, idle_minutes_to_autostop, no_setup, _is_launched_by_spot_controller)
    265     if handle is None:
--> 266         handle = backend.provision(task,
    267                                    task.best_resources,
    268                                    dryrun=dryrun,
    269                                    stream_logs=stream_logs,
    270                                    cluster_name=cluster_name,
    271                                    retry_until_up=retry_until_up)
    273 if dryrun:

File [c:\Python310\lib\site-packages\sky\utils\common_utils.py:241](file:///C:/Python310/lib/site-packages/sky/utils/common_utils.py:241), in make_decorator.._record(*args, **kwargs)
    240 with cls(full_name, **ctx_kwargs):
--> 241     return f(*args, **kwargs)

File [c:\Python310\lib\site-packages\sky\utils\common_utils.py:220](file:///C:/Python310/lib/site-packages/sky/utils/common_utils.py:220), in make_decorator.._wrapper.._record(*args, **kwargs)
    219 with cls(name_or_fn, **ctx_kwargs):
--> 220     return f(*args, **kwargs)

File [c:\Python310\lib\site-packages\sky\backends\backend.py:56](file:///C:/Python310/lib/site-packages/sky/backends/backend.py:56), in Backend.provision(self, task, to_provision, dryrun, stream_logs, cluster_name, retry_until_up)
     55 usage_lib.messages.usage.update_actual_task(task)
---> 56 return self._provision(task, to_provision, dryrun, stream_logs,
     57                        cluster_name, retry_until_up)

File [c:\Python310\lib\site-packages\sky\backends\cloud_vm_ray_backend.py:2220](file:///C:/Python310/lib/site-packages/sky/backends/cloud_vm_ray_backend.py:2220), in CloudVmRayBackend._provision(self, task, to_provision, dryrun, stream_logs, cluster_name, retry_until_up)
   2217 provisioner = RetryingVmProvisioner(
   2218     self.log_dir, self._dag, self._optimize_target,
   2219     self._requested_features, local_wheel_path, wheel_hash)
-> 2220 config_dict = provisioner.provision_with_retries(
   2221     task, to_provision_config, dryrun, stream_logs)
   2222 break

File [c:\Python310\lib\site-packages\sky\utils\common_utils.py:241](file:///C:/Python310/lib/site-packages/sky/utils/common_utils.py:241), in make_decorator.._record(*args, **kwargs)
    240 with cls(full_name, **ctx_kwargs):
--> 241     return f(*args, **kwargs)

File [c:\Python310\lib\site-packages\sky\backends\cloud_vm_ray_backend.py:1718](file:///C:/Python310/lib/site-packages/sky/backends/cloud_vm_ray_backend.py:1718), in RetryingVmProvisioner.provision_with_retries(self, task, to_provision_config, dryrun, stream_logs)
   1715 to_provision.cloud.check_features_are_supported(
   1716     self._requested_features)
-> 1718 config_dict = self._retry_zones(
   1719     to_provision,
   1720     num_nodes,
   1721     requested_resources=task.resources,
   1722     dryrun=dryrun,
   1723     stream_logs=stream_logs,
   1724     cluster_name=cluster_name,
   1725     cloud_user_identity=cloud_user,
   1726     prev_cluster_status=prev_cluster_status)
   1727 if dryrun:

File [c:\Python310\lib\site-packages\sky\backends\cloud_vm_ray_backend.py:1203](file:///C:/Python310/lib/site-packages/sky/backends/cloud_vm_ray_backend.py:1203), in RetryingVmProvisioner._retry_zones(self, to_provision, num_nodes, requested_resources, dryrun, stream_logs, cluster_name, cloud_user_identity, prev_cluster_status)
   1202 try:
-> 1203     config_dict = backend_utils.write_cluster_config(
   1204         to_provision,
...
   1450     self._close_pipe_fds(p2cread, p2cwrite,
   1451                          c2pread, c2pwrite,
   1452                          errread, errwrite)

FileNotFoundError: [WinError 3] The system cannot find the path specified.

Versions

Python Platform: Windows-10-10.0.19044-SP0
Python Version: 3.10.2 (tags/v3.10.2:a58ebcc, Jan 17 2022, 14:12:15) [MSC v.1929 64 bit (AMD64)]

Relevant packages:
awscli==1.27.115
azure-cli==2.31.0
azure-cli-core==2.31.0
azure-cli-telemetry==1.0.6
azure-core==1.26.4
boto3==1.26.115
fsspec==2023.4.0
pyarrow==11.0.0
pycryptodome==3.12.0
rich==13.3.4
runhouse==0.0.5
skypilot==0.2.5
sshfs==2023.4.1
sshtunnel==0.4.0
typer==0.7.0
wheel==0.40.0

Checking credentials to enable clouds for SkyPilot.
  AWS: disabled
    Reason: AWS CLI is not installed properly. Run the following commands:
      $ pip install skypilot[aws]    Credentials may also need to be set. Run the following commands:
      $ pip install boto3
      $ aws configure
    For more info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
  Azure: enabled
  GCP: disabled
    Reason: GCP tools are not installed or credentials are not set. Run the following commands:
      $ pip install google-api-python-client
      $ conda install -c conda-forge google-cloud-sdk -y
      $ gcloud init
      $ gcloud auth application-default login
    For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html
  Lambda: disabled
    Reason: Failed to access Lambda Cloud with credentials. To configure credentials, go to:
      https://cloud.lambdalabs.com/api-keys
    to generate API key and add the line
      api_key = [YOUR API KEY]
    to ~/.lambda_cloud/lambda_keys

SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.
If any problems remain, please file an issue at https://github.com/skypilot-org/skypilot/issues/new
Clusters
No existing clusters.

Managed spot jobs
No in progress jobs. (See: sky spot -h)

Additional context
Add any other context about the problem here.

@dongreenberg
Copy link
Contributor

Cc @@concretevitamin @Michaelvll, this looks like it might be an issue in the SkyPilot launch flow on Azure, any ideas?

@concretevitamin
Copy link

It may be related to Windows paths. Not sure if we tested Windows. Cc @romilbhardwaj

@romilbhardwaj
Copy link

We haven't tested running natively on Windows. However, I can confirm that SkyPilot works in a python environment inside Windows Subsystem for Linux (WSL). Perhaps windows users can use WSL as a workaround?

@BStoller
Copy link
Author

BStoller commented Apr 21, 2023

@romilbhardwaj I did some more looking into the code and found that one of the issues appears to be from the temp yml file being generated that specifies the properties of the machine.

file_mounts: {
  "~/.sky/sky_ray.yml": "C:\Users\stollbak\.sky\generated\test.yml.tmp",
  "~/.sky/wheels/5f10c8a76630d0f617f0312055a347bf": "C:\Users\stollbak\AppData\Local\Temp\5f10c8a76630d0f617f0312055a347bf",
  "~/.azure/azureProfile.json": "~/.azure/azureProfile.json",
  "~/.azure/clouds.config": "~/.azure/clouds.config",
  "~/.azure/config": "~/.azure/config",
  "~/.azure/msal_token_cache.json": "~/.azure/msal_token_cache.json",
}

Looks like backslashes are creating issues since they are being treated as escape characters. Where is this file being generated in the library? I can take a look and see about a fix.

@romilbhardwaj
Copy link

romilbhardwaj commented Apr 21, 2023

Hi @BStoller, the tmp yml file is generated here.

This method fills in a template YAML. The offending paths in your example above are called sky_ray_yaml_local_path and sky_local_path in our template.

  • To fix sky_ray_yaml_local_path (i.e., C:\Users\stollbak\.sky\generated\test.yml.tmp in your example), you may want to modify _get_yaml_path_from_cluster_name here.

  • To fix sky_ray_yaml_local_path (i.e., C:\Users\stollbak\AppData\Local\Temp\5f10c8a76630d0f617f0312055a347bf in your example), you may want to modify the temp_wheel_dir.absolute() returned here.

We're very open to contributions and I look forward to your PR in the SkyPilot repo :)

@dongreenberg
Copy link
Contributor

dongreenberg commented Apr 21, 2023

@BStoller , I'm happy to help write up a change if you'd test them on your windows box. I actually have a few other changes I was thinking to contribute up to SkyPilot shortly.

@BStoller
Copy link
Author

@dongreenberg No problem, I can test the changes. Let me know if you need any more information.

@aria1th
Copy link
Contributor

aria1th commented May 11, 2023

This is problem from os.devnull usage (in skypilot) in windows. Windows does not offer os.devnull at default, but WSL offers it. Thus the result is different.

import os
log_path = os.devnull
log_dir = os.path.expanduser(os.path.dirname(log_path))

This will result '' for some windows environment, but will work in linux or its subsystems.

It is headache for those windows systems- windows can't do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants