Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up time stretching ~3x on Linux with FFTW3. #349

Merged
merged 32 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b6ca53c
Add FFTW3 for faster time stretching.
psobot Jun 30, 2024
249aee7
Fix build on Ubuntu.
psobot Jul 1, 2024
ac8aa34
Lint.
psobot Jul 1, 2024
eaaf32b
Add FMA4 support.
psobot Jul 1, 2024
36f240a
Target Broadwell CPUs instead of just fma4.
psobot Jul 1, 2024
bfdc200
Add FMA4 support again.
psobot Jul 1, 2024
b6bd4e3
Enable optional AVX512 instructions.
psobot Jul 1, 2024
7559010
Add missing swizzle enum definitions.
psobot Jul 1, 2024
1b8d39b
Don't built KCVI extensions with GCC or Clang.
psobot Jul 1, 2024
8bbb669
Add missing declares.
psobot Jul 1, 2024
cf09a52
Remove duplicate assert.c.
psobot Jul 1, 2024
8cdcb05
Properly ignore assert.c on ARM.
psobot Jul 1, 2024
68dd85c
Disable AVX512.
psobot Jul 1, 2024
66bdb89
Don't compile any avx512 files.
psobot Jul 1, 2024
55e61c0
No more of these FMA4 instructions, please.
psobot Jul 2, 2024
39d112b
Nope, no AVX_128_FMA either.
psobot Jul 2, 2024
c14ebf1
No -mavx maybe?
psobot Jul 2, 2024
8dd9037
-march=native
psobot Jul 2, 2024
e9b0e1d
Tell RubberBand that we're using threads.
psobot Jul 2, 2024
3963605
Tell RubberBand that we're already configured.
psobot Jul 2, 2024
72acb11
Silly; -DNO_THREADING=0 doesn't work, you need to not define NO_THREA…
psobot Jul 2, 2024
af473a3
Use Pthreads.
psobot Jul 2, 2024
c3902da
Use PThreads, but correctly this time.
psobot Jul 2, 2024
802e458
Wrap RubberBandStretcher constructor with a mutex.
psobot Jul 2, 2024
87a320a
It's... memalign?
psobot Jul 2, 2024
7d240f2
Also disable generic SIMD.
psobot Jul 2, 2024
42ac29f
Fix Windows build.
psobot Jul 2, 2024
3b660a3
Closer...
psobot Jul 2, 2024
9037070
So close, come on MSVC, you can do it!
psobot Jul 2, 2024
0b53a1e
Use built-in FFT on Windows.
psobot Jul 2, 2024
296d51c
Disable pthreads on Windows.
psobot Jul 2, 2024
df6ff43
Just use AVX; it's all we need!
psobot Jul 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
1 change: 1 addition & 0 deletions .github/workflows/all.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
uses: jidicula/clang-format-action@v4.13.0
with:
clang-format-version: 14
exclude-regex: 'vendors/'
fallback-style: LLVM

# Build the native module with ccache enabled so we can share object files between builds:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ To cite via BibTeX:

- The core audio processing code is pulled from [JUCE 6](https://juce.com/), which is [dual-licensed under a commercial license and the GPLv3](https://juce.com/juce-6-licence).
- The [VST3 SDK](https://github.com/steinbergmedia/vst3sdk), bundled with JUCE, is owned by [Steinberg® Media Technologies GmbH](https://www.steinberg.net/en/home.html) and licensed under the GPLv3.
- The `PitchShift` plugin uses [the Rubber Band Library](https://github.com/breakfastquay/rubberband), which is [dual-licensed under a commercial license](https://breakfastquay.com/technology/license.html) and the GPLv2 (or newer).
- The `PitchShift` plugin and `time_stretch` functions use [the Rubber Band Library](https://github.com/breakfastquay/rubberband), which is [dual-licensed under a commercial license](https://breakfastquay.com/technology/license.html) and the GPLv2 (or newer). [FFTW](https://www.fftw.org/) is also included to speed up Rubber Band, and [is licensed under the GPLv2 (or newer)](https://www.fftw.org/doc/License-and-Copyright.html).
- The `MP3Compressor` plugin uses [libmp3lame from the LAME project](https://lame.sourceforge.io/), which is [licensed under the LGPLv2](https://github.com/lameproject/lame/blob/master/README) and [upgraded to the GPLv3 for inclusion in this project (as permitted by the LGPLv2)](https://www.gnu.org/licenses/gpl-faq.html#AllCompatibility).
- The `GSMFullRateCompressor` plugin uses [libgsm](http://quut.com/gsm/), which is [licensed under the ISC license](https://github.com/timothytylee/libgsm/blob/master/COPYRIGHT) and [compatible with the GPLv3](https://www.gnu.org/licenses/license-list.en.html#ISC).

Expand Down
9 changes: 4 additions & 5 deletions pedalboard/TimeStretch.h
Original file line number Diff line number Diff line change
Expand Up @@ -226,12 +226,14 @@ timeStretch(const juce::AudioBuffer<float> input, double sampleRate,
sampleRate, input.getNumChannels(), options, 1.0 / initialStretchFactor,
pow(2.0, (initialPitchShiftInSemitones / 12.0)));

rubberBandStretcher.setExpectedInputDuration(input.getNumSamples());

const float **inputChannelPointers =
(const float **)alloca(sizeof(float *) * input.getNumChannels());

size_t maximumBlockSize = rubberBandStretcher.getProcessSizeLimit();
if (!(options & RubberBandStretcher::OptionProcessRealTime)) {
rubberBandStretcher.setExpectedInputDuration(input.getNumSamples());
rubberBandStretcher.setMaxProcessSize(maximumBlockSize);

for (size_t i = 0; i < input.getNumSamples();
i += STUDY_BLOCK_SAMPLE_SIZE) {
size_t numSamples =
Expand All @@ -242,7 +244,6 @@ timeStretch(const juce::AudioBuffer<float> input, double sampleRate,
bool isLast = i + numSamples >= input.getNumSamples();
rubberBandStretcher.study(inputChannelPointers, numSamples, isLast);
}
rubberBandStretcher.setMaxProcessSize(input.getNumSamples());
}

juce::AudioBuffer<float> output(input.getNumChannels(),
Expand All @@ -254,8 +255,6 @@ timeStretch(const juce::AudioBuffer<float> input, double sampleRate,
/* keepExistingContent */ false, /* clearExtraSpace */ false,
/* avoidReallocating */ true);

size_t maximumBlockSize = rubberBandStretcher.getProcessSizeLimit();

float **outputChannelPointers =
(float **)alloca(sizeof(float *) * output.getNumChannels());

Expand Down
2 changes: 1 addition & 1 deletion pedalboard/_pedalboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def __repr__(self) -> str:
def strip_common_float_suffixes(
s: Union[float, str, bool], strip_si_prefixes: bool = True
) -> Union[float, str, bool]:
if not isinstance(s, str) or (hasattr(s, "type") and s.type != str): # type: ignore
if not isinstance(s, str) or (hasattr(s, "type") and s.type is not str): # type: ignore
return s

s = s.strip()
Expand Down
123 changes: 122 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,130 @@
ALL_CPPFLAGS.extend(
[
"-DUSE_BQRESAMPLER=1",
"-DNO_THREADING=1",
"-D_HAS_STD_BYTE=0",
"-DNOMINMAX",
"-DALREADY_CONFIGURED",
]
)


def ignore_files_matching(files, *matches):
matches = set(matches)
for match in matches:
new_files = []
for file in files:
if match in str(file):
# print(f"Skipping compilation of: {file}")
pass
else:
new_files.append(file)
files = new_files
return files


# Platform-specific FFT speedup flags:
if platform.system() == "Windows":
ALL_CPPFLAGS.append("-DUSE_BUILTIN_FFT")
ALL_CPPFLAGS.append("-DNO_THREADING")
elif platform.system() == "Darwin":
# No need for any threading code on MacOS;
# vDSP does all of this for us and these code paths are redundant.
ALL_CPPFLAGS.append("-DNO_THREADING")
elif platform.system() == "Linux":
# Use FFTW3 for FFTs on Linux, which should speed up Rubberband by 3-4x:
ALL_CPPFLAGS.extend(
[
"-DHAVE_FFTW3=1",
"-DLACK_SINCOS=1",
"-DFFTW_DOUBLE_ONLY=1",
"-DUSE_PTHREADS",
]
)
ALL_INCLUDES += ["vendors/fftw3/api/", "vendors/fftw3/"]
fftw_paths = list(Path("vendors/fftw3/").glob("**/*.c"))
fftw_paths = ignore_files_matching(
fftw_paths,
# Don't bother compiling in Altivec or VSX (PowerPC) support;
# it's 2024, not 2004 (although RIP my G5 cheese grater)
"altivec",
"vsx",
# We're not using FFTW in multi-threaded mode:
"mpi",
"threads",
# No need for tests, tools, or support code:
"tests",
"tools",
"/support",
"common/",
"libbench",
# Ignore SSE, AVX2, AVX128, and AVX512 SIMD code;
# For Rubber Band's usage, just AVX gives us the
# largest speedup without bloating the binary
"sse2",
"avx2",
"avx512",
"kcvi",
"avx-128-fma",
"generic-simd",
)

# On ARM, ignore the X86-specific SIMD code:
if "arm" in platform.processor() or "aarch64" in platform.processor():
fftw_paths = ignore_files_matching(fftw_paths, "avx", "/sse")
ALL_CFLAGS.append("-DHAVE_NEON=1")
else:
# And on x86, ignore the ARM-specific SIMD code (and KCVI; not GCC or Clang compatible).
fftw_paths = ignore_files_matching(fftw_paths, "neon")
ALL_CFLAGS.append("-march=native")
# Enable SIMD instructions:
ALL_CFLAGS.extend(
[
# "-DHAVE_SSE2",
"-DHAVE_AVX", # Testing shows this is all we need!
# "-DHAVE_AVX_128_FMA", # AMD only
# "-DHAVE_AVX2",
# "-DHAVE_AVX512", # No measurable speed difference
# "-DHAVE_GENERIC_SIMD128", # Crashes!
# "-DHAVE_GENERIC_SIMD256", # Also crashes!
]
)

ALL_SOURCE_PATHS += fftw_paths

ALL_CFLAGS.extend(
[
"-DHAVE_UINTPTR_T",
'-DPACKAGE="FFTW"',
'-DVERSION="0"',
'-DPACKAGE_VERSION="00000"',
'-DFFTW_CC="clang"',
"-includestring.h",
"-includestdint.h",
"-includevendors/fftw3/dft/codelet-dft.h",
"-includevendors/fftw3/rdft/codelet-rdft.h",
"-DHAVE_INTTYPES_H",
"-DHAVE_STDINT_H",
"-DHAVE_STDLIB_H",
"-DHAVE_STRING_H",
"-DHAVE_TIME_H",
"-DHAVE_UNISTD_H",
"-DHAVE_DECL_DRAND48",
"-DHAVE_DECL_SRAND48",
"-DHAVE_DECL_COSL",
"-DHAVE_DECL_SINL",
"-DHAVE_DECL_POSIX_MEMALIGN",
"-DHAVE_DRAND48",
"-DHAVE_SRAND48",
"-DHAVE_POSIX_MEMALIGN",
"-DHAVE_ISNAN",
"-DHAVE_SNPRINTF",
"-DHAVE_STRCHR",
"-DHAVE_SYSCTL",
]
)
if platform.system() == "Linux":
ALL_CFLAGS.append("-DHAVE_GETTIMEOFDAY")

ALL_SOURCE_PATHS += list(Path("vendors/rubberband/single").glob("*.cpp"))

ALL_SOURCE_PATHS += list(Path("vendors").glob("*.c"))
Expand Down Expand Up @@ -142,13 +261,15 @@
ALL_LINK_ARGS.append("-flto=thin")
ALL_LINK_ARGS.append("-fvisibility=hidden")
ALL_CPPFLAGS.append("-DJUCE_MODULE_AVAILABLE_juce_audio_devices=1")
ALL_CFLAGS += ["-Wno-comment"]
elif platform.system() == "Linux":
ALL_CPPFLAGS.append("-DLINUX=1")
# We use GCC on Linux, which doesn't take a value for the -flto flag:
if not DEBUG and not os.getenv("DISABLE_LTO"):
ALL_CPPFLAGS.append("-flto")
ALL_LINK_ARGS.append("-flto")
ALL_LINK_ARGS.append("-fvisibility=hidden")
ALL_CFLAGS += ["-Wno-comment"]
elif platform.system() == "Windows":
ALL_CPPFLAGS.append("-DWINDOWS=1")
ALL_CPPFLAGS.append("-DJUCE_MODULE_AVAILABLE_juce_audio_devices=1")
Expand Down
24 changes: 12 additions & 12 deletions tests/test_external_plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ def test_preset_parameters(plugin_filename: str, plugin_preset: str):
# plugin with default params.
plugin = load_test_plugin(plugin_filename)

default_params = {k: v.raw_value for k, v in plugin.parameters.items() if v.type == float}
default_params = {k: v.raw_value for k, v in plugin.parameters.items() if v.type is float}

# load preset file
plugin.load_preset(plugin_preset)
Expand All @@ -309,7 +309,7 @@ def test_initial_parameters(plugin_filename: str):
# or "gain" to 0, which slows down the re-initialization of a plugin.
k: (v.max_value if k == "gain" else v.min_value)
for k, v in get_parameters(plugin_filename).items()
if v.type == float
if v.type is float
}

# Reload the plugin, but set the initial parameters in the load call.
Expand All @@ -330,7 +330,7 @@ def test_initial_parameters(plugin_filename: str):
[
(path, parameter)
for path in AVAILABLE_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == float]
for parameter in [k for k, v in get_parameters(path).items() if v.type is float]
],
5,
),
Expand Down Expand Up @@ -557,7 +557,7 @@ def test_attributes_proxy(plugin_filename: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == bool]
for parameter in [k for k, v in get_parameters(path).items() if v.type is bool]
],
5,
),
Expand Down Expand Up @@ -585,7 +585,7 @@ def test_bool_parameters(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == bool]
for parameter in [k for k, v in get_parameters(path).items() if v.type is bool]
],
5,
),
Expand All @@ -602,7 +602,7 @@ def test_bool_parameter_valdation(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == float]
for parameter in [k for k, v in get_parameters(path).items() if v.type is float]
],
5,
),
Expand Down Expand Up @@ -644,7 +644,7 @@ def test_float_parameters(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == float]
for parameter in [k for k, v in get_parameters(path).items() if v.type is float]
],
5,
),
Expand Down Expand Up @@ -681,7 +681,7 @@ def test_float_parameter_valdation(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == str]
for parameter in [k for k, v in get_parameters(path).items() if v.type is str]
],
5,
),
Expand Down Expand Up @@ -709,7 +709,7 @@ def test_str_parameters(plugin_filename: str, parameter_name: str):
[
(path, parameter)
for path in AVAILABLE_EFFECT_PLUGINS_IN_TEST_ENVIRONMENT
for parameter in [k for k, v in get_parameters(path).items() if v.type == str]
for parameter in [k for k, v in get_parameters(path).items() if v.type is str]
],
5,
),
Expand All @@ -735,7 +735,7 @@ def test_plugin_parameters_persist_between_calls(plugin_filename: str):
for name, parameter in plugin.parameters.items():
if name == "program":
continue
if parameter.type == float:
if parameter.type is float:
low, high, step = parameter.range
if not step:
step = 0.1
Expand All @@ -747,9 +747,9 @@ def test_plugin_parameters_persist_between_calls(plugin_filename: str):
x * step for x in list(range(int(low / step), int(high / step), 1)) + [high / step]
]
random_value = random.choice(values)
elif parameter.type == bool:
elif parameter.type is bool:
random_value = bool(random.random())
elif parameter.type == str:
elif parameter.type is str:
if parameter.valid_values:
random_value = random.choice(parameter.valid_values)
else:
Expand Down
18 changes: 18 additions & 0 deletions vendors/fftw3/AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Authors of FFTW (reachable at fftw@fftw.org):

Matteo Frigo <athena@fftw.org>
Steven G. Johnson <stevenj@alum.mit.edu>

Stefan Kral <skral@fftw.org> wrote genfft-k7/*.ml*, which was
added in fftw-3.0 and removed in fftw-3.2.

Romain Dolbeau contributed support for AVX512 and KCvi.

Erik Lindahl contributed support for AVX2 and Power8 VSX.

Support for the Cell Broadband Engine was graciously donated by the
IBM Austin Research Lab, which was added in fftw-3.2 and removed in
fftw-3.3.

Support for MIPS64 paired-single SIMD instructions was graciously
donated by CodeSourcery, Inc.
Loading
Loading