Replies: 1 comment 9 replies
-
You cannot pass a length that is not a power of 2, I'd suggest you to pad the cardiograms with zeros to a length of 8192. Then since the sequence is very short you can use a smaller model (~70M params), something like this: from audio_diffusion_pytorch import AudioDiffusionModel
model = AudioDiffusionModel(
in_channels=12,
patch_size=4,
kernel_sizes_init=[1, 3, 7],
multipliers=[1, 2, 4, 4, 4],
factors=[4, 2, 2, 2],
num_blocks=[2, 2, 2, 2],
attentions=[False, True, True, True],
)
# Train model with cardiograms sources
x = torch.randn(1, 12, 8192)
loss = model(x)
loss.backward() # Do this many times
# Sample 2 cardiograms given start noise
noise = torch.randn(2, 12, 8192)
sampled = model.sample(
noise=noise,
num_steps=10 # Suggested range: 2-50
) # [2, 12, 8192] |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, thank you for the great repo for audio diffusion!
I just wanted to ask if you had any experience with multi-channel audio diffusion?
This might be out of the scope for this repo, but I am currently trying to train a diffusion model on a 12-lead electrocardiogram sampled in 500 Hz for 10 seconds ([12, 5000] shape). However, I am having difficulty training the model, so was wondering if you might have any insight to what parameters I should attempt from your intuition or experience.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions