-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time for shuffle function #51
Comments
Interesting. this sounds like it is 1000 times too slow :D |
Looking at the code, I'm wondering about three points:
Luckily, this shuffling should only be done only once at startup over a bootstraplist, and as such should not be much of a pain in runtime. Can you confirm this for me, @LaurenSpiegel ? |
@DavidPineauScality, yes this shuffle function is only used when S3 starts up and instantiates a bucketClient and an sproxydclient. It is not used in the metadata repo (they have their own shuffle function in dbapi). Note that sproxydclient has the same function in its repo rather than calling from Arsenal so that will have to be updated to point to the super new Arsenal shuffle when it is updated. I will create an issue in sproxydclient linking to this issue. |
Let's see... function shuffle(array) {
for (let i = array.length - 1; i > 0; i--) {
const randIndex = randomRange(0, i);
const randIndexVal = array[randIndex];
array[randIndex] = array[i];
array[i] = randIndexVal;
}
return array;
}; Let's assume our input array is of length 8. Here goes: function randomRange(min, max) {
if (max < min) {
throw new Error("Invalid range");
}
if (min === max) {
return min;
} else if (min < max) {
const range = (max - min);
const bits = bitsNeeded(range);
// decide how many bytes we need to draw from nextBytes: drawing less
// bytes means being more efficient
const bytes = bitsToBytes(bits);
// we use a mask as an optimization: it increases the chances for the
// candidate to be in range
const mask = createMaskOnes(bits);
let candidate;
do {
candidate = parseInt(nextBytes(bytes).toString('hex'), 16) & mask;
} while (candidate > range);
return (candidate + min);
}
} This is called with function bitsNeeded(number) {
if (number < 0) {
throw new Error("Input must be greater than or equal to zero");
} else if (number === 0) {
return 1;
} else {
return Math.floor(Math.log2(number)) + 1;
}
} OK, 8 = 2 ** 3, so function bitsToBytes(numBits) {
if (numBits < 0) {
throw new Error("Input must be greater than or equal to zero");
}
return Math.ceil(numBits / 8);
} so function createMaskOnes(numBits) {
if (numBits < 0)
throw new Error("Input must be greater than or equal to zero");
return Math.pow(2, numBits) - 1;
} and Consider what's going on here. The Or What you could consider doing here is implement a very simple (and fast) PRNG (minstd/Lehmer could do perfectly well to shuffle something, see https://en.wikipedia.org/wiki/Lehmer_random_number_generator), seed it with a 'cryptographic' random number (or even a plain timestamp), and simply modulo the PRNG result every time so the generated value fits the range you're aiming for. Also, there must be a better/more efficient way to turn a couple of bytes into the number they represent besides turning them into a hex-encoded string and parsing it base-16? Edit: Of course 8 is a bit of a 'worst case' (and so is every power of 2) causing a near-50% miss rate (or 50% chance the loop spins). Any |
On some runs, the shuffle function can take 1-2 ms to complete with an array of just 5 elements.
The text was updated successfully, but these errors were encountered: