Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many cron processes can wait locks and consume resources #25987

Closed
ilnytskyi opened this issue Dec 11, 2019 · 25 comments
Closed

Many cron processes can wait locks and consume resources #25987

ilnytskyi opened this issue Dec 11, 2019 · 25 comments
Labels
Component: Cron Component: Lock Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Priority: P3 May be fixed according to the position in the backlog. Progress: done Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Severity: S3 Affects non-critical data or functionality and does not force users to employ a workaround. stale issue Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it

Comments

@ilnytskyi
Copy link
Contributor

ilnytskyi commented Dec 11, 2019

Preconditions (*)

magento 2.3.x

  1. Cron observer uses locks (db, cache, file or zookeper)
  2. The job is locked after cron:run is sent to shell
    see this method
    \Magento\Cron\Observer\ProcessCronQueueObserver::execute

The problem, I suppose, that magetno uses the same method to run all groups or one group
in \Magento\Cron\Observer\ProcessCronQueueObserver::execute
it creates commands in shell

            if ($this->_request->getParam(self::STANDALONE_PROCESS_STARTED) !== '1'
                && $this->getCronGroupConfigurationValue($groupId, 'use_separate_process') == 1
            ) {
                $this->_shell->execute(
                    $phpPath . ' %s cron:run --group=' . $groupId . ' --' . Cli::INPUT_KEY_BOOTSTRAP . '='
                    . self::STANDALONE_PROCESS_STARTED . '=1',
                    [
                        BP . '/bin/magento'
                    ]
                );
                continue;
            }

and only then, it runs the command like this
php bin/magento cron:run --group=index --bootstrap=standaloneProcessStarted=1
it goes to this part

            $this->lockGroup(
                $groupId,
                function ($groupId) use ($currentTime, $jobsRoot) {
                    $this->cleanupJobs($groupId, $currentTime);
                    $this->generateSchedules($groupId);
                    $this->processPendingJobs($groupId, $jobsRoot, $currentTime);
                }
            );

Technically possible to remove all locks and run many bin/magento cron:run & and see dozens of created processes in shell
e.g. indexer php bin/magento cron:run --group=index --bootstrap=standaloneProcessStarted=1

Steps to reproduce (*)

  1. Run bin/magento cron:run & a few times in background
  2. See that it creates more than 1 process in shell
    eg php bin/magento cron:run --group=index --bootstrap=standaloneProcessStarted=1

Expected result (*)

  1. If in shell one cron command is running the second should not be created
  2. only one unique shell command should be processed
  3. Lock for process or group should be checked before the command goes to shell

Actual result (*)

  1. Magento can create many processes at time
  2. May simultaneously running processes check locks wait and consumes server resources
  3. Lock for process or group is checked after the command is in shell
    Selection_248

GIF:
cron-lock

WORKAROUND:
Install each cron group separetely with flock instead adding default cron:run in crontab.

@ilnytskyi ilnytskyi added Component: Cron Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release Component: Lock labels Dec 11, 2019
@m2-assistant
Copy link

m2-assistant bot commented Dec 11, 2019

Hi @ilnytskyi. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.4-develop instance - upcoming 2.4.x release

For more details, please, review the Magento Contributor Assistant documentation.

@ilnytskyi do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • yes
  • no

@magento-engcom-team magento-engcom-team added the Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed label Dec 11, 2019
@engcom-Charlie engcom-Charlie self-assigned this Dec 12, 2019
@m2-assistant
Copy link

m2-assistant bot commented Dec 12, 2019

Hi @engcom-Charlie. Thank you for working on this issue.
In order to make sure that issue has enough information and ready for development, please read and check the following instruction: 👇

  • 1. Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).

    DetailsIf the issue has a valid description, the label Issue: Format is valid will be added to the issue automatically. Please, edit issue description if needed, until label Issue: Format is valid appears.

  • 2. Verify that issue has a meaningful description and provides enough information to reproduce the issue. If the report is valid, add Issue: Clear Description label to the issue by yourself.

  • 3. Add Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 4. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 5. Add label Issue: Confirmed once verification is complete.

  • 6. Make sure that automatic system confirms that report has been added to the backlog.

@engcom-Charlie
Copy link
Contributor

Hello @ilnytskyi
Thank you for contribution and collaboration!

We are not able to reproduce this issue on the latest 2.4-develop branch by provided steps.
1
3

Result:

  • cron:run group shows same behavior like as cron:run
  • in shell one cron command is running the second should not be created
  • only one unique shell command should be processed
  • Lock for process or group is checked before the command goes to shell
    4

Could you verify my steps?

@ilnytskyi
Copy link
Contributor Author

ilnytskyi commented Dec 12, 2019

@engcom-Charlie
Yes sure
just run a few commands in background
php bin/magento cron:run & php bin/magento cron:run & php bin/magento cron:run & php bin/magento cron:run & php bin/magento cron:run & php bin/magento cron:run &

cron-lock

Lock for process or group is not checked before the command goes to shellt

You can see it on gif. Although on local fresh installation it goes fast this can be a problem on live server with high load.
And flock for bin/magento cron:run will not help since cron:run creates another processes that can wait for locks

mb smth like this for cron command can be used

'/usr/bin/flock -w 0 ' . BP . '/var/' . self::LOCK_PREFIX . $groupId . '.lock ' . $phpPath . ' %s cron:run --group=' . $groupId . ' --' . Cli::INPUT_KEY_BOOTSTRAP . '='

or this

            if ($this->lockManager->isLocked(self::LOCK_PREFIX . $groupId)) {
                continue;
            }

However I noticed some command still try to execute simultaneously in second solution, and using it in the loop can froze other jobs. Probably smth like first solution should be used on the shell level

@ilnytskyi ilnytskyi added Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch and removed Progress: needs update labels Dec 12, 2019
@ilnytskyi ilnytskyi changed the title May cron processes can wait locks and consume resources Many cron processes can wait locks and consume resources Dec 13, 2019
@engcom-Charlie
Copy link
Contributor

@ilnytskyi
Thank you for contribution and collaboration!
i just reproduced with your updates.
Now i can confirm this issue.
image

@engcom-Charlie engcom-Charlie added the Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed label Dec 13, 2019
@ghost ghost unassigned engcom-Charlie Dec 13, 2019
@magento-engcom-team
Copy link
Contributor

✅ Confirmed by @engcom-Charlie
Thank you for verifying the issue. Based on the provided information internal tickets MC-29752 were created

Issue Available: @engcom-Charlie, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

@magento-engcom-team magento-engcom-team added the Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development label Dec 13, 2019
@ivanko-dev ivanko-dev self-assigned this Dec 18, 2019
@m2-assistant
Copy link

m2-assistant bot commented Dec 18, 2019

Hi @ivan-koliadynskyy. Thank you for working on this issue.
Looks like this issue is already verified and confirmed. But if you want to validate it one more time, please, go though the following instruction:

  • 1. Add/Edit Component: XXXXX label(s) to the ticket, indicating the components it may be related to.

  • 2. Verify that the issue is reproducible on 2.4-develop branch

    Details- Add the comment @magento give me 2.4-develop instance to deploy test instance on Magento infrastructure.
    - If the issue is reproducible on 2.4-develop branch, please, add the label Reproduced on 2.4.x.
    - If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!

  • 3. If the issue is not relevant or is not reproducible any more, feel free to close it.


@ilnytskyi
Copy link
Contributor Author

The workaround can be like this:
Remove general cron:run from crontab.
Add each group to crontab with flock separately or use any other custom locker.
* * * * * flock /path/to/lock php bin/magento cron:run --group=index --bootstrap=standaloneProcessStarted=1

@sdzhepa sdzhepa added the Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it label Jul 16, 2020
@darshanperpule
Copy link

@ilnytskyi I am also facing the same issue in my prod environment.

@sivaschenko sivaschenko added Priority: P3 May be fixed according to the position in the backlog. Severity: S3 Affects non-critical data or functionality and does not force users to employ a workaround. labels Sep 17, 2020
@stale
Copy link

stale bot commented Dec 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 14 days if no further activity occurs. Thank you for your contributions.

@kilis
Copy link

kilis commented Jan 20, 2021

Applied fix (by @driskell) for issue mentioned here by back-porting fix from 2.4.3 to 2.3.5p2 and issue still persists even with fix. Today we deployed changes to live site and i still got duplicated cron groups having deadlocks. This issue is still relevant.

@HenKun
Copy link

HenKun commented Jan 20, 2021

Same here with fix applied to 2.4.1. Only thing that really works is the workaround from the issue creator: "Install each cron group separetely with flock instead adding default cron:run in crontab."

@driskell
Copy link
Contributor

Applied fix (by @driskell) for issue mentioned here by back-porting fix from 2.4.3 to 2.3.5p2 and issue still persists even with fix. Today we deployed changes to live site and i still got duplicated cron groups having deadlocks. This issue is still relevant.

The deadlocks still occur? Or the cron_schedule table still grows? Or do you refer to this issue occurring of overlapping groups? There’s 3 issues referenced. Worth noting the PR targeted 2.4 in the end and I think might not work on 2.3 completely as there are many other changes in 2.4 to assist in fixing the same issue.

@kandy
Copy link
Contributor

kandy commented Jan 21, 2021

@HenKun Can you describe your concern/steps in a new issue?

@driskell
Copy link
Contributor

I’m not sure if there might be confusion between the SQL Deadlock exception the PR targets and what is happening here which I guess is a process deadlock (or process queuing) due to some form of serialisation

@kilis
Copy link

kilis commented Jan 21, 2021

@driskell
Issue it more RAM an CPU resources issue here in crons. Cron task groups get duplicated like described in this issue and cause server resources to get a deadlock in resources causing it hang so that its not usable.

@memen45
Copy link

memen45 commented Aug 12, 2021

Is this fixed now? And if so, from which version onwards is it safe to remove the flock from cron without flooding the memory?

@hostep
Copy link
Contributor

hostep commented Aug 12, 2021

@memen45: flock was our workaround before Magento 2.2.5. Since #12497 got included in Magento 2.2.5, flock was no longer needed in my experience.
I would recommend to change the lock provider that Magento uses by default (db) to file if you can on your current Magento version. I've seen the db one causing strange issues on servers that have a high load.

Magento then added another lock fix for crons in 2.3.5 by MC-25132 and then there were some other smaller improvements done by #28007 which got included in Magento 2.4.3, but I'm not entirely sure how much that last bit helps.

Not sure if this is the full story though, but maybe it helps?

@Adel-Magebinary
Copy link

This still happens in 2.4.2 community. Introduced flock and separated the corn group now. Will report back if it still happens.

@hostep
Copy link
Contributor

hostep commented Dec 28, 2021

@Adel-Magebinary: as mentioned in the post above yours, some extra smaller improvements were made in Magento 2.4.3, maybe you could try to upgrade Magento first?

@ilnytskyi
Copy link
Contributor Author

related topic: #35639

@rukrlf
Copy link

rukrlf commented Sep 11, 2023

The workaround can be like this: Remove general cron:run from crontab. Add each group to crontab with flock separately or use any other custom locker. * * * * * flock /path/to/lock php bin/magento cron:run --group=index --bootstrap=standaloneProcessStarted=1

@ilnytskyi would you mind letting me know how to find the "/path/to/lock" in the flock command pls? I did a debug in my local environment and noticed the lock was generated in DB. It would be great if you could elaborate on if the lock is generated differently (like the lock is generated in a file) in the server when in production mode.
Thanks for your answer!

@ilnytskyi
Copy link
Contributor Author

ilnytskyi commented Sep 11, 2023

@rukrlf just put that path to any location you want, the lock files are created automatically
https://www.thegeekdiary.com/flock-command-examples-in-linux/

@rukrlf
Copy link

rukrlf commented Sep 11, 2023 via email

@magento360
Copy link

we are still getting the same problem with Magento 2.4.5-p5. Any other solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cron Component: Lock Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development Priority: P3 May be fixed according to the position in the backlog. Progress: done Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Severity: S3 Affects non-critical data or functionality and does not force users to employ a workaround. stale issue Triage: Dev.Experience Issue related to Developer Experience and needs help with Triage to Confirm or Reject it
Projects
Development

No branches or pull requests