Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement kornia-dnn with RTDETR detector #129

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

edgarriba
Copy link
Member

@edgarriba edgarriba commented Sep 8, 2024

  • creates kornia-dnn based on the ort crate

  • define RTDETRDetectorBuilder and RTDETRDetector create the ort session

  • create a basic Detection struct message

  • add example rtdetr to use rtdetr with the webcam

  • TODO: explore CUDA inference

image

/// Path to the ONNX Runtime dynamic library.
pub ort_dylib_path: PathBuf,
/// Number of threads to use for inference.
pub num_threads: usize,
Copy link

@decahedron1 decahedron1 Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution providers can be registered through the environment so this is super minor but WDYT about an execution_providers: Vec<ExecutionProviderDispatch> field & with_execution_providers to configure EPs specifically for the RTDETR session?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to somehow make ort transparent to the user and avoid them to explicitly pass the ort::ExecutionProvider and have something custom

enum ExecutionProvided {
    Cpu,
    Cuda,
    TensorRT,
}

let execution_providers = match execution_provider {
    ExecutionProvider::Cpu => vec![CpuExecutionProvider::default()],
    ExecutionProvider::Cuda => vec![CUDAExecutionProvider::default()],
    ExecutionProvider::TensorRT => vec![TensorRTExecutionProvider::default()],
};

let session = Session::builder()?
     .with_optimization_level(GraphOptimizationLevel::Level3)?
     .with_intra_threads(num_threads)?
     .with_execution_providers([execution_providers])?
     .commit_from_file(model_path)?;

I'm still experimenting with the execution providers. What is the point of defining multiple as a Vec, just because of a fallback provider ?

I've played a bit with it and i noticed that cuda/tensorrt takes few seconds to run the first frames.

A couple of questions:

  • Is there any way to leave this to the constructor of the session before i fetch the session ? (might suffer some queuying issues with cameras)
  • For tensorrt, means that's is compiling the model at runtime ? could we somehow passe a compiled model ?
  • As for the commit_from_file -- the idea is that we will have a bunch of operators/models in our Kornia HF hub which can use the commit_from_url but somehow also let the user also to give a local onnx file. Any tips here ?

Copy link

@decahedron1 decahedron1 Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about re-exporting EPs like pub use ort::{CPUExecutionProvider, CUDAExecutionProvider, TensorRTExecutionProvider} so users can still configure each EP's options instead of being limited to defaults? That should still keep things neat.

What is the point of defining multiple as a Vec, just because of a fallback provider ?

Yes.

Is there any way to leave this to the constructor of the session before i fetch the session ? (might suffer some queuying issues with cameras)

You could run it on 1 dummy frame inside the constructor, that should get the graph warmed up.

For tensorrt, means that's is compiling the model at runtime ? could we somehow passe a compiled model ?

Yes, and for CUDA it is determining the most optimal cuDNN convolution kernels. By default it performs an exhaustive search which gets the best performance at the cost of significant warmup time - this can be configured with CUDAExecutionProvider::with_conv_algorithm_search.

TensorRT graphs can theoretically be cached with TensorRTExecutionProvider::with_engine_cache but some users in the pyke Discord have reported that ONNX Runtime sometimes doesn't respect this option, and session creation can still take a few seconds despite using a cached engine. Your mileage may vary, though; I personally haven't been able to reproduce the issue, but it's just something to keep in mind.

As for the commit_from_file -- the idea is that we will have a bunch of operators/models in our Kornia HF hub which can use the commit_from_url but somehow also let the user also to give a local onnx file. Any tips here ?

How about something like this? (very roughly)

pub trait ModelSource {
    fn commit_session(&self, builder: SessionBuilder) -> ort::Result<Session>;
}

pub trait SessionBuilderExt {
    fn commit_from_source<S: ModelSource>(self, source: S) -> ort::Result<Session>;
}

impl SessionBuilderExt for SessionBuilder {
    fn commit_from_source<S: ModelSource>(self, source: S) -> ort::Result<Session> {
        source.commit_session(self)
    }
}

impl<P: AsRef<Path>> ModelSource for P {
    fn commit_session(&self, builder: SessionBuilder) -> ort::Result<Session> {
        builder.commit_from_file(self.as_ref())
    }
}

pub mod models {
    pub struct ExampleRTDETR;

    impl super::ModelSource for ExampleRTDETR {
        fn commit_session(&self, builder: SessionBuilder) -> ort::Result<Session> {
            builder.commit_from_url("https://kornia.rs/model/rtdetr.onnx")
        }
    }
}

// rtdetr.rs
pub struct RTDETRDetectorBuilder {
    pub source: Box<dyn ModelSource>
}

impl RTDETRDetectorBuilder {
    pub fn with_source<S: ModelSource>(mut self, source: S) -> Self {
        self.source = Box::new(source);
        self
    }
}

impl RTDETRDetector {
    pub fn new(...) -> Result<Self> {
        let session = Session::builder()?
            ...
            .commit_from_source(source)?;
        Ok(Self { session })
    }
}

// user code
let rtdetr = RTDETRDetectorBuilder::with_source(kornia_dnn::models::ExampleRTDETR).build()?;
// or from file
let rtdetr = RTDETRDetectorBuilder::with_source("./my-local-rtdetr.onnx").build()?;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to somehow make ort transparent to the user and avoid them to explicitly pass the ort::ExecutionProvider and have something custom

actually one more idea would be to kinda automatically set based on feature flags?

Copy link

@decahedron1 decahedron1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of the load-dynamic issue, everything looks great! 🔥

let ort_tensor = ort::Tensor::from_array((image_nchw.shape, image_nchw.into_vec()))?;

// run the model
let outputs = self.session.run(ort::inputs!["input" => ort_tensor]?)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@decahedron1 how could we pre-allocate the output tensor here ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the perfect use case for OutputSelector!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants