Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to the background loading #425

Merged
merged 8 commits into from
Oct 7, 2020

Conversation

paulfd
Copy link
Member

@paulfd paulfd commented Sep 18, 2020

The goal was to reduce the number of background threads in big sessions (multiple instances) and avoid loading the same file alot on repeated notes (e.g. drum tracks).

  • All instances use a common thread pool
  • Each instance has a dispatching job that starts background loader, and a garbage job that clears data that has been unused for some time
  • The "FilePromise" object has disappeared. Upon request, voices get a very thin reference-counting handler to the file data. When the handler is released the reference count is decreased and a "last used" timestamp is added to the file.
  • The background file loading job checks i) if the file is already loaded, or ii) if some other thread is already loading it. If any case is validated, it exits assuming the background loading is already happening.
  • The garbage job has to be triggered regularly by e.g. the synth. In the RT thread, the triggerGarbageCollection() method will check the list of previously loaded files to see if any has not been used for a while. If so, the memory is set to be discarded by a background thread, otherwise it'll wait for the next ping.

Only tested on Linux for now.

@paulfd paulfd marked this pull request as draft September 18, 2020 16:21
src/sfizz/FilePool.cpp Outdated Show resolved Hide resolved
src/sfizz/FilePool.cpp Outdated Show resolved Hide resolved
@jpcima
Copy link
Collaborator

jpcima commented Sep 19, 2020

Some thoughts on this problem in general:

  • loading can distinguish itself in 2 kinds of tasks

    • input from disk (not including decoding)
    • processing disk data: decoding and upsampling
  • given that disk loading is IO bound, there is little or no benefit to multitasking this part
    it may as well be one single thread which cycles over file handles and delivers required data chunks to processing threads.
    The IO thread passes non-decoded chunks to the CPU threads, CPU processed the decoding.

When IO and CPU work are decoupled fully, such that reading calls don't involve decoding (as opposed to direct sf_read), disk IO can be sollicited at full throughput. The mechanism might possibly be implemented using features of SF_VIRTUAL_IO.

@paulfd
Copy link
Member Author

paulfd commented Sep 21, 2020

Yes I agree, I'll check a bit into libsndfile virtual IO to see if there is a reasonable architecture. It might be hard to have something fully async though. I mean that if the virtual IO is a "blocking" one for the background threads, we might as well let the OS schedule them and their IO requests. The OS is the right place to make these decisions since it has all the required information for proper scheduling.

@paulfd paulfd closed this Sep 21, 2020
@paulfd paulfd reopened this Sep 21, 2020
@paulfd
Copy link
Member Author

paulfd commented Sep 21, 2020

Thinking about it a bit more I wonder if the "better" problem to solve wouldn't be to have chunked reading in the thread pool too. In a way, the background loading could go from a single task that opens -> reads and decode data, to a succession of smallers tasks like open the file, read a chunk, read a chunk, read a chunk, ... potentially giving back the token after each subtask is done, thereby allowing all file loading to progress at a similar rate.

@jpcima
Copy link
Collaborator

jpcima commented Sep 23, 2020

f41ae6d means that tasks don't get served while there are as many sound files as processors enqueued
if these are heavy tasks, smaller tasks are in wait until some of these complete.

@paulfd
Copy link
Member Author

paulfd commented Sep 23, 2020

You are right, although this always was the case I think. If we try to answer this question, we're basically writing a scheduler. Would it be smart(er) to let the OS do this scheduling and spawn a new thread for each file load?

@jpcima
Copy link
Collaborator

jpcima commented Sep 23, 2020

The creation of threads is usually known as a heavy operation; now this remains a thing to measure, but certainly there's a reason that thread pools and coroutine are commonly used patterns.

I thought it's ok to spawn a fixed thread pool with moderately high count. (ie. 32, 64, ..)

Imo, we can preferably use some sort of scheduling, that is made to divide file loading into smaller subtasks and enqueue them FIFO by an order of urgency. (high frequency notes qualify as more urgent) It does not get in the feet of the OS scheduler.
The problem is divided by subdividing files into fragments to read of some block size.
This can optimize the problem for latency, but throughput would be worse overall, for caching reasons.

All this is intuition, in the end we need a real benchmark of several strategies, with a measurement of throughput and latency.

@jpcima
Copy link
Collaborator

jpcima commented Sep 23, 2020

note: can we have an innocent look at how many threads ARIA has?

@paulfd
Copy link
Member Author

paulfd commented Sep 23, 2020

The creation of threads is usually known as a heavy operation; now this remains a thing to measure, but certainly there's a reason that thread pools and coroutine are commonly used patterns.

Yes, but as you said maybe we should measure. In the grand scheme of things it might actually be acceptable. And we have builtin-tooling to track this sort of things.

I thought it's ok to spawn a fixed thread pool with moderately high count. (ie. 32, 64, ..)

This could be a good middle ground. I'll test it on more constrained platforms though, so that it does not bring them to their knees.

Imo, we can preferably use some sort of scheduling, that is made to divide file loading into smaller subtasks and enqueue them FIFO by an order of urgency. (high frequency notes qualify as more urgent) It does not get in the feet of the OS scheduler.

Yes this is what I had in mind, although I'm not sure what you mean by high frequency (in terms of pitch?) in this context.

The problem is divided by subdividing files into fragments to read of some block size.
This can optimize the problem for latency, but throughput would be worse overall, for caching reasons.
All this is intuition, in the end we need a real benchmark of several strategies, with a measurement of throughput and latency.

Do you think this should also be part of this PR ? My main goal there was more to get rid of the multiple queues, the per-instance threads, as well as avoiding to repeatedly load the same files on quick succession. There is more to do to improve on this front for sure.

@paulfd
Copy link
Member Author

paulfd commented Sep 23, 2020

In particular, updating sfizz-render to "simulate" live playing (raising the thread priority and disabling freewheeling) would be helpful in evaluating different alternatives.

@jpcima
Copy link
Collaborator

jpcima commented Sep 23, 2020

Do you think this should also be part of this PR ?

No.

Yes this is what I had in mind, although I'm not sure what you mean by high frequency (in terms of pitch?) in this context.

If we have a pair of sample A and B, and B is to play at speed 2x compared to A, a more optimized delivery is going to be

  • A, B, B, A, B, B etc...

@paulfd
Copy link
Member Author

paulfd commented Sep 23, 2020

I see, yes for sure. For this PR I'll try it on Android and raspi with different numbers of threads.

@paulfd
Copy link
Member Author

paulfd commented Sep 24, 2020

On windows with the concurrency - 2 threads, reading from a spinning drive, it works "OK". There's underruns if the preload size is too low, but otherwise drums can hammer away :) On an SSD drive it's working well with the default preload size.

Raspi next.

@paulfd
Copy link
Member Author

paulfd commented Sep 27, 2020

Ok it works on the raspi 3 well enough for both LV2 and VST3. The performance of sfizz overall is not awesome but the background loader works. The rest should get better over time.

@paulfd paulfd marked this pull request as ready for review September 27, 2020 16:57
@paulfd paulfd requested a review from jpcima September 27, 2020 16:57
paulfd added 7 commits October 7, 2020 00:37
…r of background threads in big sessions (multiple instances) and avoid loading the same file alot on repeated notes (e.g. drum tracks).- All instances use a common thread pool- Each instance has a dispatching job that starts background loader, and a garbage job that clears data that has been unused for some time- The "FilePromise" object has disappeared. Upon request, Voices get a very thin reference-counting handler to the file data. When the handler is released the reference count is decreased and a "last used" timestamp is added to the file.- The background file loading job checks i) if the file is already loaded, or ii) if some other thread is already loading it. If any case is validated, it exits assuming the background loading is already happening.- The garbage job has to be triggered regularly by e.g. the synth. In the RT thread the triggerGarbageCollection() method will check the list of previously loaded file Ids to see if any has not been used for a while. If so, the memory is set to be discarded by a background thread, otherwise it'll wait for the next ping.Only tested on Linux for now.
Honestly a stupid queue would be fine at this point
@paulfd paulfd force-pushed the simpler-file-loading branch from a68d856 to a0ad3b9 Compare October 6, 2020 22:37
@paulfd paulfd merged commit 2fc6aab into sfztools:develop Oct 7, 2020
@paulfd paulfd deleted the simpler-file-loading branch October 7, 2020 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants