-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load speedspeech onnx file #1263
Comments
Okay on further look this disappears if I remove |
Latest code that fails with:
Still having a play around, I will say the level of ONNX support is impressive, all the other rust solutions I've tried have failed much much earlier and with less/no actionable logs! use super::*;
use anyhow::Context;
use tract_onnx::prelude::*;
use tract_onnx::tract_hir::infer::InferenceOp;
use ndarray::Array2;
use std::path::Path;
pub struct SpeedyTract {
model: SimplePlan<InferenceFact, Box<dyn InferenceOp>, Graph<InferenceFact, Box<dyn InferenceOp>>>,
phoneme_ids: Vec<Unit>,
}
impl SpeedyTract {
#[must_use]
pub fn load(path: impl AsRef<Path>) -> anyhow::Result<Self> {
let model = tract_onnx::onnx()
.model_for_path(path)
.context("loading ONNX file")?
// https://github.com/sonos/tract/issues/1263
// .into_optimized()
// .context("optimising graph")?
.into_runnable()
.context("converting to runnable model")?;
Ok(Self{
model,
phoneme_ids: generate_id_list(),
})
}
pub fn infer(&self, units: &[Unit]) -> anyhow::Result<Array2<f32>> {
let phonemes = units
.iter()
.map(|x| best_match_for_unit(x, &self.phoneme_ids))
.collect::<Vec<_>>(); // This is a Vec<i64>
let tensor = Tensor::from_shape(&[1, units.len()], &phonemes)?;
let plen = Tensor::from(units.len() as i64);
let result = self.model.run(tvec!(tensor.into(), plen.into()))?;
tracing::info!("Result: {:?}", result);
todo!()
}
} |
Thanks for the kind words, but Sequences (and Maps) are not supported, and are very not on the roadmap. Is there any chance you model could be refactored without sequences ? |
I don't think so tbh, it's a TTS model and as such works on variable input lengths. I wouldn't mind looking into implementing sequences if a PR would be accepted - but naturally I'm new to the code and internals so that might not feasible without at least pointing in the general direct. |
Sequences in tract would be a massively epic overhaul. tract "variables" are Tensors of known fixed rank and "symbolic dimensions". Changing this is huge and would probably have long-term impact on code complexity, maintainability and performance. So don't start hacking tensors sequences, you would most likely drown yourself in it or I would probably have to reject the PR. Let's look a other options first. You may be aware of it, but tract main application is actually voice processing and we manage to do everything we need without tensor sequences, including dealing with variable lengths and/or "infinite" inputs. Recurring networks is the traditional way, but tract state management, network pulsification and symbolic dimension management gave us the flexibility we needed to. The only kind of generalization I think tensor sequence could bring to the table would be to represent a time-based sequence of tensors having a varying dimension on a non-time axis. This is super exotic. I have never been shown such a design yet. OK, so what can we do ? I had a quick look at the network, most of it look fine, then there is a Loop that takes an empty sequence as input, push stuff in there and then the sequence is made into a tensor again. Well, bad news, the Loop is not supported either. tract has only support for Scan (supporting Loop is actually an ongoing background relatively low-priority task). Scan does a bit of what the Loop plus Sequence seems to do here: builds an output tensor by concatenating chunks of data together. The main restriction between the scan and the loop is the scan performs a fixed number of iteration determined by the time dimension of its input (which can be symbolic, it only has to be determined when the Scan starts). The loop, on the other hand can stop the iteration based on a runtime condition computed as part of the loop body itself. That is not something that can be done with scan. How familiar are you with this model design ? Am I making sense here ? |
Yeah that all makes sense thanks. I'm more aware of the model design from the details in the paper, not sure how well that maps to the actual implementation. From the phonemes going in it generates a number of frames duration for each phoneme and then for phoneme + duration it will generate the necessary spectrogram frames. I was going to look at using the torch tracing to generate a model with a longer input context than I need, but I'm a bit concerned that the loop is for the phoneme durations and therefore dynamic and it might not work as I hope if I pick my dummy input for tracing wrong. |
Uploaded a zip of the model
speedyspeech.zip
The code in question (as an aside wondering how to tell what the generics should be):
The text was updated successfully, but these errors were encountered: