Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease limit for KDF execution time in test #781

Merged
merged 2 commits into from
Feb 23, 2021

Conversation

maxammann
Copy link
Contributor

@maxammann maxammann commented Feb 20, 2021

Executing emmake make test_wasm failed for me locally with emsdk 2.0.13.
This PR should be more a small discussion whether this test really makes sense if it is platform dependant.

Checklist

  • Change is covered by automated tests
  • Benchmark results are attached (if applicable)
  • The coding guidelines are followed
  • Public API has proper documentation
  • Example projects and code samples are up-to-date (in case of API changes)
  • Changelog is updated (in case of notable or breaking changes)

@vixentael vixentael added W-WasmThemis 🌐 Wrapper: WasmThemis, JavaScript API, npm packages tests Themis test suite labels Feb 20, 2021
Copy link
Contributor

@vixentael vixentael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, so this test measures that KDF we use inside Themis for encryption/decryption with passphrase — is slow enough. KDF's security has a direct correlation with how many rounds it has ergo with processing performance. This performance is platform-dependent.

If this test fails for you, it means that KDF is calculated "too fast", meaning that we could add rounds and make it slower / more secure.

(Funny side: sometimes this test fails with "timeout error" on GithubActions server because KDF was calculated longer that test was waiting for results.)

I don't think that we'll change KDF's number of rounds anytime soon.

But out of curiosity — can you describe your laptop? Is it a new Apple MacBook M1?

@ilammy
Copy link
Collaborator

ilammy commented Feb 21, 2021

Hm...

No other platform seems to do timing tests on KDF usage. But indeed it is platform-dependent. I don't really remember hard motivation for adding this test specifically. I'm not against removing it altogether if it causes issues. Flaking tests is one of the worst kind of tests.

My vote is to drop the test altogether, with the next preference to merge this PR if removing the test is unacceptable.


One of reasons I can envision for a test like this is to serve as an indicator for when the KDF iteration count should be increased. Now that I've got to git blame, here's what it says:

Also, ensure that passphrase API is slow enough, just in case it gets swapped with master key API, or some technical progress will make WebAssembly to run really fast.

So... uh... technical progress seems to have happened.

Taking benchmarks on CI is arguably a methodologically wrong approach. CI runners have unstable performance so any absolute numbers that you derive by executing stuff there are mostly moot. If anything, this benchmark should probably ensure that KDF is slower than non-KDF by a factor of N, or something. A relative benchmark like this will be more robust.

But again. The test does not seem to serve a useful purpose. Themis does not have a policy on bumping KDF iteration count so I don't really know what we're measuring here. For now, the iteration count is managed on arbitrary basis. And if we're updating this parameter basically whenever we feel like it, then I guess we shouldn't be lying to ourselves by trying to make it look like a scientific decision with this test.

If we are going to be serious about it, we should have proper benchmarks for Wasm runtime, and base the decision to bump KDF iteration count on that. There are benchmarks for native code. I guess they could be compiled to Wasm with some elbow grease. However, it's certainly out of scope for this PR, along with defining the policy about KDF parameters.


P.S.

can you describe your laptop?

Sure, it's silvery and rectangular.

Also, I call preconceived notions and desktop user bias on you! /s

@maxammann
Copy link
Contributor Author

Flaking tests is one of the worst kind of tests.

Thats why I made the effort to open this issue :) Sharing your view!

But out of curiosity — can you describe your laptop?

Just a not-so-old ThinkPad T490 clocking up to 4.6GHz. So it is faster than a CI should be.

If we are going to be serious about it, we should have proper benchmarks for Wasm runtime, and base the decision to bump KDF iteration count on that. There are benchmarks for native code. I guess they could be compiled to Wasm with some elbow grease. However, it's certainly out of scope for this PR, along with defining the policy about KDF parameters.

I agree that this is the wrong place for timing tests. Also tests which involve timing generally should not exist. Is there already infrastructure for benchmarking? I think benchmarking should be integrated in the whole develop-test-release cycle if it should have a meaningful impact.

My vote is to drop the test altogether, with the next preference to merge this PR if removing the test is unacceptable.

If we do not have a replacement for measuring KDF duration when I would say we keep this test to have a reminder in the future that we need some replacement. If we remove it we will not have the possibility to use git blame on a failing test ;)

@ilammy
Copy link
Collaborator

ilammy commented Feb 21, 2021

Is there already infrastructure for benchmarking?

There are some tools for benchmarking Themis Core code in benches/themis but it's rather incomplete and has its own issues.

  • Benchmarks do not cover high-level wrappers at all.
    • Wrappers are thought to add some constant overhead on FFI. Hence, “not interesting”.
    • However, no idea how much overhead that is and how it compares to time spent on doing parsing, crypto, etc.
      • Which might be significant. I don't remember the numbers that @Lagovas shown for GoThemis, but doing FFI was maybe an order of magnitude slower that doing the same stuff in Go natively. Who knows how expensive FFI calls are elsewhere.
  • Benchmarks do not cover all cryptosystems.
    • Basically, only Secure Cell in Seal mode now.
    • Though, it does cover KDF vs. non-KDF variants – which was the main interest then.
  • Benchmark harness uses Rust.
    • I did not feel like reinventing the wheel with statistics when doing microbenchmarks.
    • Pro: nice reports and nice stats for free.
    • Con: not having a slightest idea of how to actually port and run that on, say, WebAssembly, Android and iPhones. Those platforms also use Themis Core code and it would be interesting to see some benchmarks there.
  • Benchmark harness did not have much love poured into it after initial development.

So it exists but not in a form that can be immediately used for WebAssembly. My current mental model is

  1. Uh...
  2. Use wasm-pack maybe?
  3. ...
  4. BENCHMARKS

Not something that one can immediately act upon :)

And then there are various ways that you can run WebAssembly: Node, WASM, WASI...

@Lagovas
Copy link
Collaborator

Lagovas commented Feb 23, 2021

We should not remove testing that KDF slow enough because it's expected behavior. But we can parameterize our tests and allow to override values for different test environments. We can use values from env variable/config + default value (for our environment + backward compatibility).

@maxammann
Copy link
Contributor Author

We should not remove testing that KDF slow enough because it's expected behavior. But we can parameterize our tests and allow to override values for different test environments. We can use values from env variable/config + default value (for our environment + backward compatibility).

The problem is, that is is quite impossible to choose a value as this is hardware dependent. Timing in tests is always a bad idea. I just randomly halved the value of 200ms to get the test working.

We can use a test runner for benchmarking though. Like if you run npm run test then all tests except the timing one is run.
npm run benchmark would run that timing test. If that fails it is not a programming issue. Test can verify software. But the software is still correct if the KDF rounds are too low.

What do you think to split the current tests into 2 test suits?

@vixentael
Copy link
Contributor

But the software is still correct if the KDF rounds are too low.

correct but insecure :)

I agree with @Lagovas , I think we should parametrize this test and use default value for tests on CI while preserving ability to override default for testing in other environments.

When this test will start failing everywhere, we should think about increasing KDF rounds count across Themis. Right now we use PBKDF2 with 200'000 rounds while NIST recommends 10'000 .. 100'000. For some tests (like fuzzing) we decreased number of rounds to 10.

Also, a note: right now we discuss this test under assumption that it takes a little time because of KDF rounds count, while the real reason might be different.

@ilammy
Copy link
Collaborator

ilammy commented Feb 23, 2021

I'd like to agree on this with @maxammann:

The problem is, that is is quite impossible to choose a value as this is hardware dependent.

Absolute time it takes to process data depends on hardware and workload.

Give me a time limit and I can tweak CPU allocation to the process so that it exceeds the limit. Give me the code and I can game the time limit so that the computation stays within that limit.

Talking about benchmarks without taking the environment into consideration is kinda meaningless. The only reason for 200 ms being there: that's the magic number that makes CI pass in whatever environment is there. If we take that as a baseline – i.e., Themis is fine if CI is green – then there is no need to change this number.

For one, changing that number – lower time bound for a test, adjusted in this PR – does not make Secure Cell more or less secure. If anything, we should start questioning whether there is a need to change the different number – iteration count – because apparently Secure Cell is not slow enough at least for one person's machine out there.

But for that we need to first define how much is “enough”. If the attacker would compute PBKDF2 on an ASIC or GPU then even our current choice of iteration count might turn out to be woefully inadequate (as is probably the choice of PBKDF2 in the first place, if you include ASICs into the threat model). But bumping the default iteration count to 10,000,000 would probably make users more angry than grateful.

If we're going to have this discussion, then I think it should be moved somewhere else. GitHub has this new Discussions thingie. Maybe give it a try?

@ilammy
Copy link
Collaborator

ilammy commented Feb 23, 2021

Now, addressing the feedback...

@maxammann,

What do you think to split the current tests into 2 test suits?

Which one of those will be running on the CI? I guess, only the “tests” – the one which verify correctness – would be run on CI, and as a part of every PR. While some “benchmarks” might be run by default when a developer run the test suite, or it could be completely opt-in. (As proper benchmarks could take a non-negligible amount of time to collect data.)

@vixentael and @Lagovas,

I think we should parametrize this test

If you ask me, it's already parameterized enough: there's this number in tests that you can adjust if the test fails for you. Exporting it into environment variable does not really make things easier for a developer who is running those tests locally. It's not something that you'd need to tweak between test runs or something.

@vixentael,

When this test will start failing everywhere,

I'd like to point out that WasmThemis is probably the only environment where such test exists. I don't remember anything like that elsewhere.

Copy link
Collaborator

@ilammy ilammy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the essence of this PR,

  • I find that CI runs fine with this change applied.
  • I find that CI runs fine without this change applied.
  • I find that for CI this change does not include much benefit.
  • I find that for CI this change does not include much harm.
  • I find that this change does not entail much maintenance burden.
  • I find that this change makes life better for at least one person by making the test suite fail less often for them.

So my disposition here is to merge.

@ilammy
Copy link
Collaborator

ilammy commented Feb 23, 2021

As for the other questions raised here, I'd like to discuss them elsewhere, not in some random PR.

You're all welcome to raise more 👍

@vixentael vixentael merged commit f234eaf into cossacklabs:master Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Themis test suite W-WasmThemis 🌐 Wrapper: WasmThemis, JavaScript API, npm packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants