-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Book 3 (advanced loading + hub) #263
Merged
Changes from 8 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
8246416
3rd phase.
Narsil 45642a8
Fixing examples.
Narsil a44471a
Adding more details on how to load things.
Narsil a70b95f
Marking unwritten chapters as Draft (disables the link).
Narsil 1b705a4
Remove duplicate.
Narsil c11e78b
Odd rebase artifact.
Narsil ae68635
Add small error management.
Narsil 166f4d1
`s/candle/candle_core/g`
Narsil 1b2b32e
Remove dead page.t
Narsil dba3147
Typos and format and CD only when PR lands.
Narsil File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Advanced Cuda usage |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Porting a custom kernel |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Writing a custom kernel |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,51 @@ | ||
# Error management | ||
|
||
You might have seen in the code base a lot of `.unwrap()` or `?`. | ||
If you're unfamiliar with Rust check out the [Rust book](https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html) | ||
for more information. | ||
|
||
What's important to know though, is that if you want to know *where* a particular operation failed | ||
You can simply use `RUST_BACKTRACE=1` to get the location of where the model actually failed. | ||
|
||
Let's see on failing code: | ||
|
||
```rust,ignore | ||
let x = Tensor::zeros((1, 784), DType::F32, &device)?; | ||
let y = Tensor::zeros((1, 784), DType::F32, &device)?; | ||
let z = x.matmul(&y)?; | ||
``` | ||
|
||
Will print at runtime: | ||
|
||
```bash | ||
Error: ShapeMismatchBinaryOp { lhs: [1, 784], rhs: [1, 784], op: "matmul" } | ||
``` | ||
|
||
|
||
After adding `RUST_BACKTRACE=1`: | ||
|
||
|
||
```bash | ||
Error: WithBacktrace { inner: ShapeMismatchBinaryOp { lhs: [1, 784], rhs: [1, 784], op: "matmul" }, backtrace: Backtrace [{ fn: "candle::error::Error::bt", file: "/home/nicolas/.cargo/git/checkouts/candle-5bb8ef7e0626d693/f291065/candle-core/src/error.rs", line: 200 }, { fn: "candle::tensor::Tensor::matmul", file: "/home/nicolas/.cargo/git/checkouts/candle-5bb8ef7e0626d693/f291065/candle-core/src/tensor.rs", line: 816 }, { fn: "myapp::main", file: "./src/main.rs", line: 29 }, { fn: "core::ops::function::FnOnce::call_once", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs", line: 250 }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs", line: 135 }, { fn: "std::rt::lang_start::{{closure}}", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 166 }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs", line: 284 }, { fn: "std::panicking::try::do_call", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal::{{closure}}", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "std::panicking::try::do_call", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "std::rt::lang_start", file: "/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 165 }, { fn: "main" }, { fn: "__libc_start_main" }, { fn: "_start" }] } | ||
``` | ||
|
||
Not super pretty at the moment, but we can see error occured on `{ fn: "myapp::main", file: "./src/main.rs", line: 29 }` | ||
|
||
|
||
Another thing to note, is that since Rust is compiled it is not necessarily as easy to recover proper stacktraces | ||
especially in release builds. We're using [`anyhow`](https://docs.rs/anyhow/latest/anyhow/) for that. | ||
The library is still young, please [report](https://github.com/LaurentMazare/candle/issues) any issues detecting where an error is coming from. | ||
|
||
## Cuda error management | ||
|
||
When running a model on Cuda, you might get a stacktrace not really representing the error. | ||
The reason is that CUDA is async by nature, and therefore the error might be caught while you were sending totally different kernels. | ||
|
||
One way to avoid this is to use `CUDA_LAUNCH_BLOCKING=1` as an environment variable. This will force every kernel to be launched sequentially. | ||
You might still however see the error happening on other kernels as the faulty kernel might exit without an error but spoiling some pointer for which the error will happen when dropping the `CudaSlice` only. | ||
|
||
|
||
If this occurs, you can use [`compute-sanitizer`](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) | ||
This tool is like `valgrind` but for cuda. It will help locate the errors in the kernels. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,7 @@ | ||
# Running a model | ||
|
||
|
||
In order to run an existing model, you will need to download and use existing weights. | ||
Most models are already available on https://huggingface.co/ in [`safetensors`](https://github.com/huggingface/safetensors) format. | ||
|
||
Let's get started by running an old model : `bert-base-uncased`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,104 @@ | ||
# Using the hub | ||
|
||
Install the [`hf-hub`](https://github.com/huggingface/hf-hub) crate: | ||
|
||
```bash | ||
cargo add hf-hub | ||
``` | ||
|
||
Then let's start by downloading the [model file](https://huggingface.co/bert-base-uncased/tree/main). | ||
|
||
|
||
```rust | ||
# extern crate candle_core; | ||
# extern crate hf_hub; | ||
use hf_hub::api::sync::Api; | ||
use candle_core::Device; | ||
|
||
let api = Api::new().unwrap(); | ||
let repo = api.model("bert-base-uncased".to_string()); | ||
|
||
let weights = repo.get("model.safetensors").unwrap(); | ||
|
||
let weights = candle_core::safetensors::load(weights, &Device::Cpu); | ||
``` | ||
|
||
We now have access to all the [tensors](https://huggingface.co/bert-base-uncased?show_tensors=true) within the file. | ||
|
||
You can check all the names of the tensors [here](https://huggingface.co/bert-base-uncased?show_tensors=true) | ||
|
||
|
||
## Using async | ||
|
||
`hf-hub` comes with an async API. | ||
|
||
```bash | ||
cargo add hf-hub --features tokio | ||
``` | ||
|
||
```rust,ignore | ||
# This is tested directly in examples crate because it needs external dependencies unfortunately: | ||
# See [this](https://github.com/rust-lang/mdBook/issues/706) | ||
{{#include ../../../candle-examples/src/lib.rs:book_hub_1}} | ||
``` | ||
|
||
|
||
## Using in a real model. | ||
|
||
Now that we have our weights, we can use them in our bert architecture: | ||
|
||
```rust | ||
# extern crate candle_core; | ||
# extern crate candle_nn; | ||
# extern crate hf_hub; | ||
# use hf_hub::api::sync::Api; | ||
# | ||
# let api = Api::new().unwrap(); | ||
# let repo = api.model("bert-base-uncased".to_string()); | ||
# | ||
# let weights = repo.get("model.safetensors").unwrap(); | ||
use candle_core::{Device, Tensor, DType}; | ||
use candle_nn::Linear; | ||
|
||
let weights = candle_core::safetensors::load(weights, &Device::Cpu).unwrap(); | ||
|
||
let weight = weights.get("bert.encoder.layer.0.attention.self.query.weight").unwrap(); | ||
let bias = weights.get("bert.encoder.layer.0.attention.self.query.bias").unwrap(); | ||
|
||
let linear = Linear::new(weight.clone(), Some(bias.clone())); | ||
|
||
let input_ids = Tensor::zeros((3, 7680), DType::F32, &Device::Cpu).unwrap(); | ||
let output = linear.forward(&input_ids); | ||
``` | ||
|
||
For a full reference, you can check out the full [bert](https://github.com/LaurentMazare/candle/tree/main/candle-examples/examples/bert) example. | ||
|
||
## Memory mapping | ||
|
||
For more efficient loading, instead of reading the file, you could use [`memmap2`](https://docs.rs/memmap2/latest/memmap2/) | ||
|
||
**Note**: Be careful about memory mapping it seems to cause issues on [Windows, WSL](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5893) | ||
and will definitely be slower on network mounted disk, because it will issue more read calls. | ||
|
||
```rust,ignore | ||
{{#include ../../../candle-examples/src/lib.rs:book_hub_2}} | ||
``` | ||
|
||
**Note**: This operation is **unsafe**. [See the safety notice](https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safety). | ||
In practice model files should never be modified, and the mmaps should be mostly READONLY anyway, so the caveat most likely does not apply, but always keep it in mind. | ||
|
||
|
||
## Tensor Parallel Sharding | ||
|
||
When using multiple GPUs to use in Tensor Parallel in order to get good latency, you can load only the part of the Tensor you need. | ||
|
||
For that you need to use [`safetensors`](https://crates.io/crates/safetensors) directly. | ||
|
||
```bash | ||
cargo add safetensors | ||
``` | ||
|
||
|
||
```rust,ignore | ||
{{#include ../../../candle-examples/src/lib.rs:book_hub_3}} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
# Serialization | ||
|
||
Once you have a r | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Serialization |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing some content here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. :)
I'll remove that file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good now