Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not create conda environment #45

Open
kailust opened this issue Mar 15, 2023 · 17 comments
Open

can not create conda environment #45

kailust opened this issue Mar 15, 2023 · 17 comments
Assignees

Comments

@kailust
Copy link

kailust commented Mar 15, 2023

Describe the bug
Followed the instructions but could not get

conda env create -f environment.yml

to work because of

ResolvePackageNotFound: 
  - cudatoolkit=11.6.0
  - faiss-gpu=1.7.2
  - nccl=2.12.12.1
  - cupy=10.4.0

To Reproduce
Steps to reproduce the behavior:
Intall miniconda
run
conda env create -f environment.yml

Expected behavior
Create an environment called OpenChatKit but can't create

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Mac

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@jkyndir
Copy link

jkyndir commented Mar 15, 2023

i encountered a similar issue:

ResolvePackageNotFound:
  - nccl=2.12.12.1

Any help would be greatly appreciated.

I'm on a Windows machine. Is this package not supported on Windows??!!!

@wallon-ai
Copy link

same issue. I'm on ubuntu.

@jkyndir
Copy link

jkyndir commented Mar 15, 2023

same issue. I'm on ubuntu.

ugh, this sucks.

wish the requirements n instructions are clearer.

@JiliHili
Copy link

我遇到了类似的问题:

ResolvePackageNotFound:
  - nccl=2.12.12.1

任何帮助将不胜感激。

我在一台Windows机器上。此程序包在 Windows 上不受支持吗??!!!

#19 (comment)

@jayshah96
Copy link

same issue on mac

@yibeilaopo
Copy link

I removed this line first and then did the installation, and then executed
conda install -c conda-forge nccl

@samchen8008
Copy link

any update? I am still having this issue.
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

CUDA Command
v10.2 (x86_64) pip install cupy-cuda102
v10.2 (aarch64 - JetPack 4) pip install cupy-cuda102 -f https://pip.cupy.dev/aarch64
v11.0 (x86_64) pip install cupy-cuda110
v11.1 (x86_64) pip install cupy-cuda111
v11.2 ~ 11.8 (x86_64) pip install cupy-cuda11x
v11.2 ~ 11.8 (aarch64 - JetPack 5 / Arm SBSA) pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64
v12.x (x86_64) pip install cupy-cuda12x
v12.x (aarch64 - JetPack 5 / Arm SBSA) pip install cupy-cuda12x -f https://pip.cupy.dev/aarch64

CUDA

Command

v10.2 (x86_64)

pip install cupy-cuda102

v10.2 (aarch64 - JetPack 4)

pip install cupy-cuda102 -f https://pip.cupy.dev/aarch64

v11.0 (x86_64)

pip install cupy-cuda110

v11.1 (x86_64)

pip install cupy-cuda111

v11.2 ~ 11.8 (x86_64)

pip install cupy-cuda11x

v11.2 ~ 11.8 (aarch64 - JetPack 5 / Arm SBSA)

pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64

v12.x (x86_64)

pip install cupy-cuda12x

v12.x (aarch64 - JetPack 5 / Arm SBSA)

pip install cupy-cuda12x -f https://pip.cupy.dev/aarch64

@juntezhang
Copy link

I am getting the same issue with Mamba.

conda install mamba -n base -c conda-forge
mamba env create -f environment.yml -n OpenChatKit-Test

Getting the following errors:

Could not solve for environment specs
Encountered problems while solving:
  - nothing provides requested cudatoolkit 11.6.0**
  - nothing provides requested cupy 10.4.0**
  - nothing provides requested faiss-gpu 1.7.2**
  - nothing provides requested nccl 2.12.12.1**
  - nothing provides cuda 11.6.* needed by pytorch-cuda-11.6-h867d48c_0

The environment can't be solved, aborting the operation

Running on macOS 12.16.3

Would be nice if the README can add the prerequisites for setting up the environment.

@csris
Copy link
Contributor

csris commented Mar 18, 2023

Would be nice if the README can add the prerequisites for setting up the environment.

I'll update the README. I believe these packages are only available on Linux. Windows users might be able to use WSL (see issue #19), but I don't think this will run on macOS.

@Nemunas
Copy link

Nemunas commented May 11, 2023

this helped on ubuntu:
conda config --set channel_priority false

@orangetin
Copy link
Member

For anyone trying to run inference on a Mac (fyi the training scripts will not work):

This environment.yml worked for me. Since Macs don’t have a CUDA device, you’re going to have to use CPU packages. There is a way to leverage GPU acceleration with MPS but I haven’t tried that yet.

For inference, you’d have to modify the Python script to use CPU. I’m going to put up a PR soon, but for now, reference this bot.py.

Changes:

  • Remove nccl (only works on Linux). Note, you won’t be able to use training scripts because nccl only works on Linux. (Maybe LoRa will still work?)
  • add Rust to the conda dependency list because it is needed to build the pip wheel for transformers on Mac.
  • specify versions for numpy and pillow
  • faiss-gpu to faiss-cpu
  • remove pytorch-cuda, cupy, cudatoolkit (cuda dependency)

environment.yml for Mac:

name: OpenChatKit
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - faiss-cpu=1.7.4
  - fastparquet=0.5.0
  - pip=22.3.1
  - pyarrow=8.0.0
  - python=3.10.9
  - python-snappy=0.6.1
  - pytorch=1.13.1
  - snappy=1.1.9
  - torchaudio=0.13.1
  - torchvision=0.14.1
  - rust=1.69.0
  - pip:
      - accelerate==0.17.1
      - datasets==2.10.1
      - loguru==0.6.0
      - netifaces==0.11.0
      - transformers==4.27.4
      - wandb==0.13.10
      - zstandard==0.20.0
      - numpy==1.24.3
      - pillow==9.5.0

@EdgBuc
Copy link

EdgBuc commented Jun 1, 2023

managed to run, thank to @orangetin
But when write any command, end up with:

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Mac M1.

@orangetin
Copy link
Member

managed to run, thank to @orangetin But when write any command, end up with:

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Mac M1.

@EdgBuc make sure the dtype is not float16. Set it to either bfloat16 or float32. (CPU does not support float16). Reference this line.

@EdgBuc
Copy link

EdgBuc commented Jun 2, 2023

Indeed,
removing if/else and leaving only: torch_dtype = torch.bfloat16 made a trick.
BTW, it is super slow (1 word per min ) in answering :)

@orangetin
Copy link
Member

Indeed, removing if/else and leaving only: torch_dtype = torch.bfloat16 made a trick. BTW, it is super slow (1 word per min ) in answering :)

You don't need to modify the if/else statement if you pass --no-gpu and -r as args. See this for more info.

Yeah, that is expected if you're running this on CPU. However, on Silicon, you should be able to set the device to mps and it should use acceleration. That is, change cpu in this line to mps and it should do the trick :)

@EdgBuc
Copy link

EdgBuc commented Jun 2, 2023

Indeed, removing if/else and leaving only: torch_dtype = torch.bfloat16 made a trick. BTW, it is super slow (1 word per min ) in answering :)

You don't need to modify the if/else statement if you pass --no-gpu and -r as args. See this for more info.

Yeah, that is expected if you're running this on CPU. However, on Silicon, you should be able to set the device to mps and it should use acceleration. That is, change cpu in this line to mps and it should do the trick :)

well when tired to change to mps,
got:

The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1670525498485/work/aten/src/ATen/mps/MPSFallback.mm:11.)
  input_ids = input_ids.repeat_interleave(expand_size, dim=0)

and then:

RuntimeError: Currently topk on mps works only for k<=16

@orangetin
Copy link
Member

I can't reproduce this error on an M2 (following the instructions I provided). This seems like a PyTorch error.

The only modification to the code was to change cpu to mps like I described earlier. Here's the command I ran:
python3 inference/bot.py --model togethercomputer/RedPajama-INCITE-Base-3B-v1 --no-gpu -r 16

Getting better speeds but still not great. Read this blogpost if you want faster CPU inference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

14 participants