-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide clear_internal_memory() callback, document the grave performance impact of not using the callbacks #231
Comments
FWIW, for these tests I patched the phc-winner-argon2 tree itself as follows: diff --git a/src/core.c b/src/core.c
index 8781852..9c3a27c 100644
--- a/src/core.c
+++ b/src/core.c
@@ -99,7 +99,18 @@ int allocate_memory(const argon2_context *context, uint8_t **memory,
if (context->allocate_cbk) {
(context->allocate_cbk)(memory, memory_size);
} else {
+#undef MALFRE
+#ifdef MALFRE
*memory = malloc(memory_size);
+#else
+ static __thread void *m = NULL;
+ static __thread size_t ms = 0;
+ if (!m || memory_size > ms) {
+ printf("Allocating %zu bytes\n", memory_size);
+ m = realloc(m, ms = memory_size);
+ }
+ *memory = m;
+#endif
}
if (*memory == NULL) {
@@ -116,11 +127,16 @@ void free_memory(const argon2_context *context, uint8_t *memory,
if (context->free_cbk) {
(context->free_cbk)(memory, memory_size);
} else {
+#ifdef MALFRE
free(memory);
+#endif
}
}
void NOT_OPTIMIZED secure_wipe_memory(void *v, size_t n) {
+#if 1
+ return;
+#endif
#if defined(_MSC_VER) && VC_GE_2005(_MSC_VER)
SecureZeroMemory(v, n);
#elif defined memset_s
diff --git a/src/run.c b/src/run.c
index b21b9d7..c8bb5f8 100644
--- a/src/run.c
+++ b/src/run.c
@@ -131,11 +131,14 @@ static void run(uint32_t outlen, char *pwd, size_t pwdlen, char *salt, uint32_t
fatal("could not allocate memory for hash");
}
+int i;
+for (i = 0; i < 1000; i++) {
result = argon2_hash(t_cost, m_cost, threads, pwd, pwdlen, salt, saltlen,
out, outlen, encoded, encodedlen, type,
version);
if (result != ARGON2_OK)
fatal(argon2_error_message(result));
+}
stop_time = clock();
(making code edits between the tests), and I used this script: #!/bin/sh
M=47
MEM=32768
killall argon2
for n in `seq 0 $M`; do
echo password | taskset -c $n ./argon2 saltsalt -d -t 1 -k $MEM | fgrep seconds &
done
wait
echo
for n in `seq 0 $M`; do
echo password | taskset -c $n ./argon2 saltsalt -d -t 1 -k $MEM | fgrep seconds &
done
wait
echo
for n in `seq 0 $M`; do
echo password | taskset -c $n ./argon2 saltsalt -d -t 1 -k $MEM | fgrep seconds &
done
wait Because of issue #204, I ran the script under |
The only uses of |
Here are some benchmarks of Argon2d 1.3 t=1 p=1 as implemented in phc-winner-argon2 as of today for the password hashing use case (not KDF) on 2x E5-2650v4 (24 cores, 48 threads) with 8x DDR4-2400 (a server kindly provided by Packet) running CentOS 7, built with that distro's default gcc, using AVX2 (I verified with
objdump -d
), in cumulative hashes per second after warm-up (CPUs at max turbo for all cores in use):Per
/proc/meminfo
, the memory was allocated asAnonHugePages
, so I don't expect further speedup from explicit use of huge pages on this Linux kernel (but there might be on CentOS 6, which per my testing wouldn't allocate huge pages transparently). 24 threads were consistently slightly slower across all of these, e.g. 980 hashes/s for the last one.Here are some more, for 2 MiB (high request rate capacity, targeting L3 cache):
This shows that memory (de)allocation and cleansing has a combined performance cost of over 2x at least on this platform for 32 MiB, but apparently glibc is being smart at 2 MiB allocations so only the cleansing costs us at that size.
The inclusion of such overhead in default builds and its potential extent should be documented prominently (perhaps including right in the
argon2
program output, not only in some documentation file), and a way for applications to override not only the memory (de)allocation functions, but alsoclear_internal_memory()
should be provided.BTW, while the code mostly uses the
clear_internal_memory()
wrapper function, in two places it callssecure_wipe_memory()
directly - intentional (why?) or a bug?As a sanity check, current yescrypt t=0 p=1 (built for plain SSE2, as that still works best even on Broadwell-EP with yescrypt's default 128-bit S-box lookups, which I like for them being so numerous) computes 780 hashes/s for 32 MiB, 48 threads at default settings, 815 at pwxform rounds reduced from 6 to 3. Since it has another 1/3 of a pass over the memory (on top of Argon2's t=1), this would translate to
780*4/3 = 1040
to815*4/3 = 1087
hashes/s for just the memory filling phase at same memory access speed, which is similar to the 1070 we see for Argon2d with the artificial slowdowns removed. That's at the same 1 KiB block size that Argon2 uses. Higher memory bandwidth is usable on this system at 2 KiB (925*4/3 = 1233
) and 4 KiB (1005*4/3 = 1340
,1005*32*2*4/3*2^20/10^9 = 90 GB/s
), but these are not an option for Argon2 (block size not configurable).The text was updated successfully, but these errors were encountered: