Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cache] Fix concurrent use of a cache entry #3785

Merged
merged 1 commit into from
Aug 30, 2023
Merged

Conversation

dalgwen
Copy link
Contributor

@dalgwen dalgwen commented Aug 28, 2023

Closes #3507
Fix some rare case where the concurrent reads of a cache entry can fail.

Explanation:
Lines 286 and 288 are executed nearly at the same time. "Nearly" was sufficient to cause error when setting maxToRead and inducing the thread to think that the wanted data cannot be read.

Closes openhab#3507
Fix some rare case where the concurrent readings of a cache entry can fail.

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
@dalgwen dalgwen requested a review from a team as a code owner August 28, 2023 13:55
@wborn wborn added the bug An unexpected problem or unintended behavior of the Core label Aug 28, 2023
@wborn
Copy link
Member

wborn commented Aug 28, 2023

Thanks for having a look!

Wouldn't there also concurrency issues when multiple InputStreams call read(int start, int sizeToRead) with random start and sizeToRead values?

There are also issues when currentSize is being updated by one thread and another thread already completed reading and set completed to true:

    protected Long getTotalSize() {
        if (completed) { // we already know the total size of the sound
            return currentSize;

@dalgwen
Copy link
Contributor Author

dalgwen commented Aug 28, 2023

Wouldn't there also concurrency issues when multiple InputStreams call read(int start, int sizeToRead) with random start and sizeToRead values?

I don't think so ? Multiple threads can call this "read" method concurrently :
If the data is already present in the FileChannel (i.e. if previous read of the source inputstream have already gone this far), then we fetch the data from the FileChannel. Concurrent reads are allowed.
If not, only one of the calling thread at a time can read the source input stream and complete the FileChannel, thanks to the synchronized block.
start or sizeToRead values doesn't matter, I do not see a case where they could break the flow (except maybe if they are negative, but, well, if someone calls this method with a negative value, he deserves the crash !). Only the start + size matter, as it determines how far the stream have to be to honor the request.

There are also issues when currentSize is being updated by one thread and another thread already completed reading and set completed to true:

I think currentSize cannot be updated concurrently with completed. They are exclusive (one is in IF / the other is in ELSE) and in a synchronized block with "! completed" as a condition to enter the block.

But concurrent operations are tricky, I may have missed something...

@wborn
Copy link
Member

wborn commented Aug 29, 2023

I now see that threads won't go into the while loop once completed is true. But I still don't understand your explanation:

Lines 286 and 288 are executed nearly at the same time. "Nearly" was sufficient to cause error when setting maxToRead and inducing the thread to think that the wanted data cannot be read.

Why would this matter if the statements are executed sequentially? Is there something asynchronous going on in the FileChannel?

@dalgwen
Copy link
Contributor Author

dalgwen commented Aug 30, 2023

OK, I was not very clear, sorry.

Imagine 2 threads that want the same cache entry. Neither of them had read any data, and its a cache miss, so the cache entry is empty.

  • Thread 1 starts and checks line 271 if the FileChannel has the data it wants. The FileChannel is length 0 and so it enters the synchronized block to get data from the input stream. It gets data, goes to line 286 and writes to the FileChannel the data he just get. It is now line 287, trace logging a message. Now that the FileChannel has the data it wants, it will go on and read it. But we pause it here because enter thread 2.

  • Thread 2 also needs data. Line 271, it checks if the data it wants is available. It uses the FileChannel size for this purpose. The FileChannel has just been updated by Thread 1, so yeah, it seems to have the data it needs. So it doesn't enter the synchronized block and goes directlly line 299. It computes its maxToRead value (which is the amount of data he will get now). Before the correction, it uses currentSize to compute this value. BUT thread 1 has not updated the currentSize value, because it is still logging line 287 and the update of currentSize is line 288. So thread 2 has a wrong currentSize (still 0) and allocate a size 0 ByteBuffer. So the code after that will fill it with... no byte at all.

By the way, for maxToRead, using a min between sizeToRead and FileChannel.size is needed because we can imagine a file smaller than the sizeToRead asked by the caller. Allocating the exact data size needed is mandatory to fill it and returns exactly the right amount of data read from the FileChannel, without reallocating another buffer inbetween.

So, using currentSize to compute the size of the buffer we will fill was wrong, in some rare instances there was some concurrency issues. Using FileChannel.size fixes that.

This system is complex but I'm a little bit proud of it: with it, we can cache AND serve data to several thread requesting it concurrently, without waiting for the end of the source input stream.
Use case : saying something to several audio sink in the house, at the same time, with no start latency, all that with one call to the TTS service.

Copy link
Member

@wborn wborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! I now understand why it happens and did not think about a thread being suspended while it is running in the synchronized block. 🙃 It's certainly worth it to have no delays. In the end the unit test also helped with finding a real concurrency issue. 🙂

@wborn wborn merged commit 3ddbdb2 into openhab:main Aug 30, 2023
@wborn wborn added this to the 4.1 milestone Aug 30, 2023
@dalgwen dalgwen deleted the 3507 branch September 17, 2023 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of the Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LRUMediaCacheEntryTest unstable
2 participants