-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rust runtime #1597
Add rust runtime #1597
Conversation
if let Some(q) = SGX_QUEUES.lock().unwrap().pop_front() { | ||
ThreadPool::run_worker(q); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function interacts with the C++ runtime dylib
pub(super) strides: Option<Vec<usize>>, | ||
pub(super) byte_offset: isize, | ||
pub(super) numel: usize, | ||
pub(super) dshape: Vec<i64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the dshape
mentioned in the PR description. When the target arch is 32-bit, usize
is 32 bits. SInce the TVM module wants a 64-bit shape array, there needs to be an owned copy stored somewhere (i.e. in dshape
).
src/runtime/rust/.rustfmt.toml
Outdated
report_todo = "Never" | ||
report_fixme = "Never" | ||
ignore = [] | ||
verbose_diff = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the default rustfmt.toml
produced by rustfmt --dump-default-config
Let us directly put it under tvm/rust |
@tqchen absolutely! First of all, I'd like to thank @nhynes for his initiation. Since three months ago, I've followed his code and learned a lot from his contribution. Well-done Nick! Here're some questions that I need clarifications:
|
+1 for docs and comments, I would recommend having one example for users to get started, let us make sure all the public API are documented and commented, that is what we need reviews for :) I also created an issue to follow the discussion #1601 |
@tqchen yes, it was confusing to me that this PR adds more and is different from what already exists for "runtime frontend" support that I had in mind which I thought is the intention of v0.5 roadmap. |
Nothing at all. That's what we're working on: a unified Rust crate-of-crates which supports both frontend and "backend" Rust. Currently, having a Rust backend is useful for the emerging technologies of Wasm and SGX. These are effectively CPU targets, so that's where the effort has gone.
The tests could easily be made into docs. I'll wait for @jroesch's reviews before diving into docs. |
rust/src/runtime/array.rs
Outdated
} | ||
} | ||
|
||
fn tensor_from_array_storage<'a, 's, T, D: ndarray::Dimension>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to make these static methods?
rust/src/runtime/graph.rs
Outdated
DLDataTypeCode_kDLFloat, DLDataTypeCode_kDLInt, DLDataTypeCode_kDLUInt, DLTensor, | ||
}; | ||
|
||
const NDARRAY_MAGIC: u64 = 0xDD5E40F096B4A13F; // Magic number for NDArray file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need a deeper explanation.
Just did a quick pass before bed, code looks good, but could use lots of docs explaining it. I think it would be very hard for someone who didn't know the existing runtime to understand. I'll take another stab tomorrow or after you do docs? just let me know what would be more useful. |
thanks for your review @jroesch! In the interest of making efficient use of time, I'll make a docs pass before you do further review. |
src/runtime/rust/Cargo.toml
Outdated
authors = ["Nick Hynes <nhynes@berkeley.edu>"] | ||
|
||
[features] | ||
par-launch-alloc = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has #1226 been resolved already? if not, then I don't think it's a good idea to put it as a feature and maybe in a separate branch until it's resolved!
const DEFAULT_ALIGN_BYTES: usize = 4; | ||
|
||
#[derive(PartialEq, Eq)] | ||
pub struct Allocation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it better to impl Alloc for it? or at least add the stabilized #[global_allocator] attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually an allocator--it's just a way to get bytes out of the global allocator in a less-unstable way.
pub(super) shape: Vec<usize>, | ||
pub(super) strides: Option<Vec<usize>>, | ||
pub(super) byte_offset: isize, | ||
pub(super) numel: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to rename numel
to size
which is now the standard definition in ndarray notion of both numpy and bluss's Rust ndarray.
} | ||
} | ||
|
||
impl<'a, 't> TryFrom<&'a Tensor<'t>> for ndarray::ArrayD<f32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wonder about other numeric types, f64
or ints?
fn get_function<S: AsRef<str>>(&self, name: S) -> Option<PackedFunc>; | ||
} | ||
|
||
pub struct SystemLibModule {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty struct with no {}
, maybe?
pub type PackedFunc = Box<Fn(&[TVMArgValue]) -> TVMRetValue>; | ||
|
||
#[macro_export] | ||
macro_rules! call_packed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clever :)
impl_prim_ret_value!(u32, 1); | ||
impl_prim_ret_value!(f32, 2); | ||
impl_boxed_ret_value!(String, 11); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other primitives, perhaps?
} | ||
|
||
impl<'a> Tensor<'a> { | ||
pub fn shape(&self) -> Vec<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlinable with #[inline]
and the like methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally don't use #[inline]
because 1) I'm not smarter than the compiler and 2) the end user can always use link-time optimization (ref).
Also keep in mind that fast binary is not always as important as size: in Wasm, for instance, small binaries download faster. Similarly in SGX, smaller binaries affords more enclave memory for application use. In any case, LTO is unconditionally more useful than inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially agree! while it's debatable, my reference is bluss's ndarray and the like libs he's wrote with careful inlining. I'm not familiar with wasm though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it's a debate: LTO is unconditionally better. The ndarray crate is relatively old--perhaps bluss was using best practices from a time when the Rust linker was less sophisticated.
References:
- https://stackoverflow.com/questions/37639276/when-should-inline-be-used-in-rust
- https://stackoverflow.com/questions/1759300/when-should-i-write-the-keyword-inline-for-a-function-method
- https://stackoverflow.com/questions/7046547/link-time-optimization-and-inline
- https://stackoverflow.com/questions/5948703/difference-in-inlining-functions-by-compiler-or-linker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links! I'll look into them. The major source of (anything in fact and) inlining is in std, for example. I think, if in doubt in general, then skip it for now.
} | ||
|
||
impl DataType { | ||
fn itemsize(&self) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same #[inline(always)]
. (Not repeating for the like)
@nhynes thanks! a couple of points/questions:
|
I think rust version of runtime has its own merit, mainly for support wasm support(seems rust support is better than c++), so it is good to enable this option for users who need it |
@tqchen let me rephrase my question. Wouldn't it be clear from the context of the library that rust will provide runtime support? in relation to #1601, I'm thinking of three types of API supports: 1) compiler backend: c++ (and python for prototyping) 2) runtime frontend: python, java, js, etc. 3) runtime backend (this PR) and frontend (my work): rust. So rust is not going to be in 1), right? |
What you're working on isn't a runtime, though. It'd still go in the
Not until rust has good python FFI support. That's the main benefit of C[++] |
@tqchen the last number of changes addressed some of my concerns and comments above. Besides improving docs and some other smaller things (like |
I haven't looped around to fixing that, but since the rust runtime is mostly useful for Wasm (single threaded) and SGX (which doesn't manage its own threads), it's not a blocker. |
@nhynes Ok! So we shouldn't allow the related |
Actually, I just fixed the bug. Turns out the issue was that the temporal workspaces only work correctly if they have the same alignment as the dtype they're supposed to hold. Go figure. |
Let us push to get this in, @nhynes please send a separate PR to upstream to update the Dockerfile.demo_cpu with necessary rust env. @nhynes I take a quick look, although I am not rust expert, it seems to me that the documentation are still sparse. Please at least document
After this is done and test get passed we can merge it in and followup with @ehsanmok on rust frontend |
The convention is to use the auto-generated docs with manual docs only when necessary. Here's a link to the current docs. It's also populated with examples. I'll add a top-level module doc, add a dockerfile config, and address those code review comments. |
should I also update CI? Rust builds might be a bit flaky since this crate needs to use nightly, though |
Oh, yes, I meant to say ci_cpu. Using nightly is fine. However, note that we build the docker image infrequently, which means we will be stuck to a certain version of nightly. Using a stable version or hashtag is preferred |
Please add rust testcases to Jenkinsfile to here https://github.com/dmlc/tvm/blob/master/Jenkinsfile#L141 |
This PR adds a Rust runtime which is useful when creating statically linked TVM modules--either native or Wasm. Indeed, Rust has the best Wasm support. It's also good for SGX.
The current code supports both TVM and NNVM modules. The API around
Tensor
conversions toDLTensor
is a bit awkward sinceDLTensor
isn't owned, but it shouldn't be too hard to iron out.There's one nit, though: the
dshape
field in theTensor
struct exists solely because TVM usesu64
as theshape_t
even when on 32-bit platforms. I'll make a PR for this eventually...