You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It turned out the scheduler is surprisingly inefficient at loading very large lists. After some math it turns out it needs to be redesigned to allow lists of that size in one go:
On a 64bit system, even just collecting the pointers of all newlines takes a very large amount of memory:
14_000_000_000 * 8 = 112000000000 # 104.3 GiB
Three important bits we need to keep in mind for feature parity with the current system:
due to threading we need to be able to process this list at multiple positions at once
to measure the process, we need to know how many credentials we've processed, but we also need to know how many we have to process in total
jobs can fail and need to be rescheduled
To support lists that large, we'd have to change the scheduler design:
generator thread
Open the list of credentials
Scan the whole file and count newlines
Seek back to 0
Start the worker threads
Fill a size-limited mpsc queue with credentials, then block at send
Every time a worker receives from the queue, send unblocks, and a new line can be loaded that we try to insert into the queue.
Memory-wise, this would be one of the most lightweight solutions.
offset + limit
This could be applied to dict-style runs as well:
Skip offset number of attempts
Submit limit number of attempts
Ignore everything else
This would also allow resumption from aborted jobs (assuming the offset has been saved) or distributed tests (especially for dict style runs) as well.
It would be quirky to use though.
zero-copy + chunk assignment
To avoid overhead that comes from our data structures, we could just map the whole file into ram and then operate on slices. Since we need to process this list in parallel we could assign this file into chunks of a specific size and each worker is able to process this chunk individually, no synchronization needed until the end of that chunk has been reached.
This still requires enough ram to load the whole file at once.
Mutex<Cursor>
We can simply scan the file in the main thread, count the credentials, seek back to 0 and then lock the file handle in a mutex:
lock the bufreader
read an entry
release the mutex
parse the credentials and test them
This would introduce the need for an exception message to the msg loop since reading from the file might fail in a non-recoverable way.
Note that there's also some overhead by the way the threadpool currently works, which allocates some memory for each job that we want to run. While this isn't much, keep in mind that a single byte per credential would result in 14gb.
In the end, I'm not sure if tests that large are realistic and how much effort should go into this.
The text was updated successfully, but these errors were encountered:
It turned out the scheduler is surprisingly inefficient at loading very large lists. After some math it turns out it needs to be redesigned to allow lists of that size in one go:
On a 64bit system, even just collecting the pointers of all newlines takes a very large amount of memory:
Three important bits we need to keep in mind for feature parity with the current system:
To support lists that large, we'd have to change the scheduler design:
generator thread
Memory-wise, this would be one of the most lightweight solutions.
offset + limit
This could be applied to dict-style runs as well:
offset
number of attemptslimit
number of attemptsThis would also allow resumption from aborted jobs (assuming the offset has been saved) or distributed tests (especially for dict style runs) as well.
It would be quirky to use though.
zero-copy + chunk assignment
To avoid overhead that comes from our data structures, we could just map the whole file into ram and then operate on slices. Since we need to process this list in parallel we could assign this file into chunks of a specific size and each worker is able to process this chunk individually, no synchronization needed until the end of that chunk has been reached.
This still requires enough ram to load the whole file at once.
Mutex<Cursor>
We can simply scan the file in the main thread, count the credentials, seek back to 0 and then lock the file handle in a mutex:
This would introduce the need for an exception message to the msg loop since reading from the file might fail in a non-recoverable way.
Note that there's also some overhead by the way the threadpool currently works, which allocates some memory for each job that we want to run. While this isn't much, keep in mind that a single byte per credential would result in 14gb.
In the end, I'm not sure if tests that large are realistic and how much effort should go into this.
The text was updated successfully, but these errors were encountered: