Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPORM-99 Add optimized cache and lock drivers #2877

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Apr 16, 2024

Fix fo PHPORM-99

In theory, we can use DatabaseStore and DatabaseLock. But various issues prove the changing nature of their implementation, based on new features in the query builder, make this feature unstable for MongoDB users. fix #2718, fix #2609

By introducing dedicated drivers, we can optimize the implementation to use the mongodb library directly instead of the subset of features provided by laravel query builder.

Usage:

# config/cache.php
return [

    'stores' => [

        'mongodb' => [
            'driver' => 'mongodb',
            'connection' => 'mongodb',
            'collection' => 'cache',
            'lock_connection' => 'mongodb',
            'lock_collection' => 'cache_locks',
            'lock_lottery' => [2, 100],
            'lock_timeout' => '86400',
        ]
    ]

]

Cache:

// Store any value into the cache. The value is serialized in MongoDB
Cache::set('foo', [1, 2, 3]);

// Read the value
dump(Cache::get('foo'));

// Clear the cache
Cache::flush();

Lock:

// Get an unique lock. It's very important to keep this object in memory
// so that the lock can be released.
$lock = Cache::lock('foo');
$lock->block(10); // Wait 10 seconds before throwing an exception if the lock isn't released

// Any time-consuming task
sleep(5);

// Release the lock
$lock->release();

Checklist

  • Add tests and ensure they pass
  • Add an entry to the CHANGELOG.md file
  • Update documentation for new features

@GromNaN GromNaN force-pushed the PHPORM-99 branch 3 times, most recently from 46747a9 to 55c5327 Compare April 16, 2024 23:07
@GromNaN GromNaN marked this pull request as ready for review April 16, 2024 23:07
@GromNaN GromNaN requested a review from a team as a code owner April 16, 2024 23:07
@GromNaN GromNaN requested review from alcaeus and jmikola April 16, 2024 23:07
$this->collection->deleteMany(['expiration' => ['$lte' => $this->currentTime()]]);
}

return $result->owner === $this->owner;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm using findOneAndUpdate and a comparison of the value to return true when the lock is acquired twice by the same instance during the same second. If I use updateOne and check $result->getUpsertedCount() > 0 || $result->getModifiedCount() > 0;, the result is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the lock is acquired twice by the same instance during the same second

What is the significance of "same instance during the same second"?

If updateOne reports that no document was upserted or modified, wouldn't that imply that the lock could not be acquired (i.e. has another owner and has not expired)?

I was testing this locally:

$coll = $client->selectCollection('test', 'coll');
$coll->drop();

$key = 'foo';
$owner = 'bar';
$currentTime = time();
$expiresAt = $currentTime + 10;

$isExpiredOrAlreadyOwned = ['$or' => [
    ['$lte' => ['$expiration', $currentTime]],
    ['$eq' => ['$owner', $owner]],
]];

$result = $coll->updateOne(
    ['key' => $key],
    [
        ['$set' => [
            'owner' => ['$cond' => [$isExpiredOrAlreadyOwned, $owner, '$owner']],
            'expiration' => ['$cond' => [$isExpiredOrAlreadyOwned, $expiresAt, '$expiration']],
        ]],
    ],
    ['upsert' => true],
);

printf("\n\nupdateOne: matched(%d) modified(%d) upserted(%d)\n\n",
    $result->getMatchedCount(),
    $result->getModifiedCount(),
    $result->getUpsertedCount(),
);

if ($result->getMatchedCount()) {
    echo "lock '$key' already existed\n";
}

if ($result->getModifiedCount()) {
    echo "took over an expired lock for '$owner', which will now expire at $expiresAt\n";
}

if ($result->getUpsertedCount()) {
    echo "created the lock for '$owner', which will now expire at $expiresAt\n";
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about this test:

public function testLockCanBeAcquired()
{
$lock = $this->getCache()->lock('foo');
$this->assertTrue($lock->get());
$this->assertTrue($lock->get());

updateOne reports no document modified if the values in $set are not different from the values already stored.

In your example, if you run updateOne 2 times, the 2nd time you get "lock 'foo' already existed" and nothing else. In Laravel, true is expected because the lock is acquired and owned by the current instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand the significance of "same second" in your first comment. The timestamp field wouldn't change on subsequent updates within the same clock second, and you rely on checking the owner field (not possible with updateOne) to determine if the lock is acquired.

Copy link
Member Author

@GromNaN GromNaN Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted to updateOne in this additional PR, because UTCDateTime is more precise and acquiring a lock with the same owner in the same microsecond will not happen.

src/MongoDBServiceProvider.php Outdated Show resolved Hide resolved
Comment on lines 673 to 674
// Ignore "duplicate key error"
if ($exception->getCode() !== 11000) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I should ignore only "duplicate key errors". Laravel docs

The insertOrIgnore method will ignore errors while inserting records into the database. When using this method, you should be aware that duplicate record errors will be ignored and other types of errors may also be ignored depending on the database engine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the ambiguity that is built into the documentation - you can't really be sure what's ignored. One option is to ignore any errors that purely come from the data (e.g. unique key violations, document constraint violations) but still report more serious problems (like server selection errors). IMO, users should be more aware of what could go wrong and explicitly ignore those errors if they don't care for them.

For the purpose of the cache/lock work, this seems sufficient so I'm fine with only ignoring unique key errors and returning to this if people have more errors they think should be ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is no longer used by the cache. I'm reverting this change, and creating a dedicated ticket PHPORM-170

src/Query/Builder.php Outdated Show resolved Hide resolved
src/Cache/MongoStore.php Outdated Show resolved Hide resolved
src/Cache/MongoStore.php Outdated Show resolved Hide resolved
src/Cache/MongoStore.php Outdated Show resolved Hide resolved
Comment on lines +24 to +25
// Provides "many" and "putMany" in a non-optimized way
use RetrievesMultipleKeys;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be improved later in a new PR.

@@ -25,13 +25,15 @@
"php": "^8.1",
"ext-mongodb": "^1.15",
"composer-runtime-api": "^2.0.0",
"illuminate/support": "^10.0|^11",
"illuminate/cache": "^10.36|^11",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required to access Cache::isOwnedCurrentProcess
laravel/framework@234444e

src/Cache/MongoLock.php Show resolved Hide resolved
src/Cache/MongoLock.php Outdated Show resolved Hide resolved
],
);

if (random_int(1, $this->lottery[1]) <= $this->lottery[0]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is beyond the PR, but is there a particular reason that [2, 100] is used instead of 0.02? This just seems like an odd API.

Copy link
Member Author

@GromNaN GromNaN Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see see the same structure used in DatabaseLock, but not the Lottery class itself. Consistency is probably more important here. It may be worth elaborating on the structure of that two-element array in the doc block comment, though. As is, it's not entirely clear how the two values relate to one another.

src/Cache/MongoLock.php Outdated Show resolved Hide resolved
src/Cache/MongoLock.php Outdated Show resolved Hide resolved
$this->collection->deleteMany(['expiration' => ['$lte' => $this->currentTime()]]);
}

return $result->owner === $this->owner;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the lock is acquired twice by the same instance during the same second

What is the significance of "same instance during the same second"?

If updateOne reports that no document was upserted or modified, wouldn't that imply that the lock could not be acquired (i.e. has another owner and has not expired)?

I was testing this locally:

$coll = $client->selectCollection('test', 'coll');
$coll->drop();

$key = 'foo';
$owner = 'bar';
$currentTime = time();
$expiresAt = $currentTime + 10;

$isExpiredOrAlreadyOwned = ['$or' => [
    ['$lte' => ['$expiration', $currentTime]],
    ['$eq' => ['$owner', $owner]],
]];

$result = $coll->updateOne(
    ['key' => $key],
    [
        ['$set' => [
            'owner' => ['$cond' => [$isExpiredOrAlreadyOwned, $owner, '$owner']],
            'expiration' => ['$cond' => [$isExpiredOrAlreadyOwned, $expiresAt, '$expiration']],
        ]],
    ],
    ['upsert' => true],
);

printf("\n\nupdateOne: matched(%d) modified(%d) upserted(%d)\n\n",
    $result->getMatchedCount(),
    $result->getModifiedCount(),
    $result->getUpsertedCount(),
);

if ($result->getMatchedCount()) {
    echo "lock '$key' already existed\n";
}

if ($result->getModifiedCount()) {
    echo "took over an expired lock for '$owner', which will now expire at $expiresAt\n";
}

if ($result->getUpsertedCount()) {
    echo "created the lock for '$owner', which will now expire at $expiresAt\n";
}

src/Cache/MongoStore.php Show resolved Hide resolved
src/Cache/MongoStore.php Outdated Show resolved Hide resolved
src/Cache/MongoStore.php Outdated Show resolved Hide resolved
@GromNaN GromNaN requested a review from jmikola April 18, 2024 13:36
$this->collection->deleteMany(['expiration' => ['$lte' => $this->currentTime()]]);
}

return $result->owner === $this->owner;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand the significance of "same second" in your first comment. The timestamp field wouldn't change on subsequent updates within the same clock second, and you rely on checking the owner field (not possible with updateOne) to determine if the lock is acquired.


public function flush(): bool
{
$this->collection->deleteMany([]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider dropping the collection? My storage engine knowledge is outdated, but I still reckon that'd be a faster operation.

And if there's no concern about preserving indexes or other metadata (e.g. validation rules), there isn't much benefit to keeping the original collection in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not, if the engine supports it well. What is the risk of race-conditioning if there is an insert at the same time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping locks the collection, and thus I expect it'd avoid any potential race conditions. In contrast, I expect deleteMany() would allow other operations to interleave.

To clarify: I don't consider this a real risk for the PR. The drop just seemed like a more straightforward operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to drop the collection. You could legitimately want to create a TTL index on the expiration field, and dropping the collection to purge the cache would have the side-effect of deleting the index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See additional PR that enable TTL index: #2891

tests/Cache/MongoCacheStoreTest.php Show resolved Hide resolved
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've said my peace, so I'll defer to you to incorporate any feedback you like.

tests/Cache/MongoCacheStoreTest.php Show resolved Hide resolved
@GromNaN GromNaN force-pushed the PHPORM-99 branch 2 times, most recently from 3df8f45 to 3f42383 Compare April 22, 2024 08:01
$this->collection->deleteMany(['expiration' => ['$lte' => $this->currentTime()]]);
}

return $result['owner'] === $this->owner;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to use array-access to please static analysis tools that doesn't understand that MongoDB\MongoDB\BSONDocument has magic property-access.

https://phpstan.org/r/1e7b5d92-8b98-4e53-863f-b0c8652820c1
https://psalm.dev/r/ab76084ecf

$doc = new ArrayObject(['foo' => 'bar'], ArrayObject::ARRAY_AS_PROPS);
$doc->bar;

It's also more robust in case array is used as default typemap.

tests/Cache/MongoCacheStoreTest.php Show resolved Hide resolved

public function flush(): bool
{
$this->collection->deleteMany([]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to drop the collection. You could legitimately want to create a TTL index on the expiration field, and dropping the collection to purge the cache would have the side-effect of deleting the index.

@GromNaN GromNaN merged commit d0978a8 into mongodb:4.3 Apr 22, 2024
23 checks passed
@GromNaN GromNaN deleted the PHPORM-99 branch April 22, 2024 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to support upsert function in Laravel 10 Can't store cache lock
3 participants