-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent running again already running cron group #12497
Prevent running again already running cron group #12497
Conversation
@dmanners @ihor-sviziev I've created this MVP implementation of cron group based locking. This is now being tested in Vaimo internally too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @paveq,
Thank you so much for such good PR!
I think there should be done few fixes. Could you review my comments?
* @param string $name lock name | ||
* @return bool | ||
*/ | ||
public function setLock($name, $timeout = -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As magento 2.2 and later support php 7+ only, could you scecify types for these variables? Same for other methods in this file + interface.
Also would be great to add declare strict types for new file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing:
MySQL 5.7.5 and later enforces a maximum length on lock names of 64 characters. Previously, no limit was enforced.
I think it should be checked in order to prevent mysql errors there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented scalar type hints.
private $resource; | ||
|
||
public function __construct( | ||
\Magento\Framework\App\ResourceConnection $resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you import this resource connection, it will be more readable there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
namespace Magento\Framework\lock\Backend; | ||
|
||
class Database implements \Magento\Framework\Lock\LockManagerInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to cover this class with unit and integration tests. Could you add them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll work on implementing some test cases for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases added.
In order for locks not to conflict in setups where multiple Magento installations are running in single database, we need to use some kind of installation specific unique prefix for the lock. Framework\Cache\Core::_id() and Framework\App\Cache\Frontend\Factory already contain similar mechanism that is used by cache, and could be used by locks too. However that method is protected, and injecting cache core to this code seems not a correct solution anyway. Would it make sense for this unique installation specific id generation to be moved away from Cache Core, and be available in a generic way? (I'm suggesting that in scope of this PR we make such mechanism/API interface available, but not refactor Cache Core yet, otherwise the scope might grow to be quite big). |
Does this pose an issue? In Magento system requirements we support MySQL 5.6 onwards. Until system requirements have been raised to MySQL 5.7.5, perhaps we can check that only one lock can be acquired through LockManager at once? Ideally we would support multiple locks from MySQL 5.7.5 upwards, but at this moment that feature is not needed (YAGNI), and there appears to be no easy way to detect MySQL version from the connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I don't think we need to check mysql version there. Current implementation looks good.
One thing: now Magento\Cron\Test\Unit\Observer\ProcessCronQueueObserverTest fails, could you update it?
/** | ||
* @var \Magento\Framework\Lock\Backend\Database | ||
*/ | ||
protected $model; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use private visibility instead of protected (for all methods in this file)?
* @param string $name lock name | ||
* @return bool | ||
*/ | ||
public function releaseLock(string $name): bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add @throws
to phpdoc block there?
return (bool)$this->resource->getConnection()->query("SELECT IS_USED_LOCK(?);", array((string)$name))->fetchColumn(); | ||
} | ||
|
||
private function checkLength(string $name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add phpdoc block there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good!
\Magento\Framework\App\State $state | ||
\Magento\Framework\App\State $state, | ||
StatFactory $statFactory, | ||
\Magento\Framework\Lock\LockManagerInterface $lockManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't BC be preserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrian-martinez-interactiv4, We treat Observers and Plugins as internal implementation and don't try to preserve BC for them
I've applied this patch to two production sites. While it does seem to resolve the duplicate CRON issue, it seems to have introduced a regression: The Has anyone else seen this issue? I didn't open a separate Github issue for this, as this code isn't in a GA release yet, so I can't imagine many have run into this. In the #11002 (comment) issue, someone suggested a solution to solve this problem, however it seems like that should not be necessary if the success lifetime is set to 180 for all CRON groups. |
Hi @erikhansen. Looks like problem introduced by this code https://github.com/paveq/magento2/blob/175229e23b5df6473bebf7acd24356a38ca6c9e3/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L502 Need change it to
Can I ask you to create separate issue for this? |
I implemented this on a 2.2.4 production site. It definitely prevents duplicates from running but in our case, it didn't run jobs at all if one of the jobs ended up taking a long time. Since it's locking groups I think if a job in the default group takes a long time it won't schedule or launch other groups; at least that's my working theory right now. I had to revert this set of changes. |
@kandy feel free to create PR for this, even without issue reporting |
@kandy Can you expound on the change you mentioned in your comment from a couple of weeks ago? The only difference I see in your codeblock is that this line:
should be changed to:
However if you look at the source file, there is no Can you let me know what this should be changed to, and I'll change it and test it on a stage environment? I can then create a PR for this, expounding on the issue along with your suggested fix. |
@erikhansen, look at his updated
|
@hostep Thanks. I just realized that independently and was coming back here to post when I saw your comment. I'm deploying this to a stage environment that had 1.2M rows in |
@erikhansen Can you confirm if this has fixed issues for you? We've got a few servers with production Magento sites on and it brings MySQL to its knees. It'd be great if this is a word around fixes things.. |
@torindul Applying the contents of this PR as a patch to a Magento 2.2.3 site fixes the issue with the same CRON job running at the same time. By the way, this Gist contains a query that will show you if you have a problem with this.
For anyone wanting to apply this PR as a patch to their environment, here are patches that can be applied via composer-patches (you'll need to remove |
I'm currently going through the process of upgrading a site from 2.2.3 > 2.2.5. In the process, I realized that the changes in this PR are included in 2.2.5. I expect that anyone upgrading to 2.2.5 will experience the issue with CRONs not getting cleaned up. I will soon be upgrading a vanilla Magento site to 2.2.5 and if I'm able to reproduce the CRON issue, I'll open a Github issue with full details. |
hi @erikhansen, thank for so deep investigation |
@kandy Thanks. I was aware that the fix you referenced was included in 2.2.5. However I have confirmed that this is still an issue. I will be opening a new Github issue this morning and will post a link here once I've done that. |
@erikhansen, Also, pay attention that cron history cleanup time was changed to 7 days in commit |
@kandy I finally made time to look into this. I was wrong: 2.2.5 does not have an issue with CRONs that never get deleted. What I was seeing was exactly what you pointed out: the In 2.2.4 and prior, having a large row count in the For the record, here's what a
Compared to 2.2.4 and before:
|
@paveq @hostep @sidolov @magento-engcom-team @nmalevanec I am getting:
I have emptied cron_schedule, still same issue. How do I clear the locks down??? This is how my cron runs:
|
@craigcarnell Can I ask you create separate issue for this? Also, can you describe your environment? |
@craigcarnell: it might have something to do with your |
We're still running 2.2.8. It doesn't run very fast, though. A big part of this was cronjobs piling up due to taking longer than a minute to run because of mysql congestion. Examining cron_schedule, I found: IMO there should ever only be one scheduled job of a given type in "pending" state which is past its scheduled time, as all others that were scheduled even earlier can be considered "missed"; there is no benefit in having them still pending, as multiple runs will not "catch up" any better than a single run. |
@dmanners unfortunately, only members of the maintainers team are allowed to unassign developers from the pull request |
Description
Prevents running the same cron group concurrently. Alternative implementation to #11465
Fixed Issues (if relevant)
Manual testing scenarios
Contribution checklist