Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue consumers #23540

Closed
luckyraul opened this issue Jul 2, 2019 · 25 comments
Closed

Queue consumers #23540

luckyraul opened this issue Jul 2, 2019 · 25 comments
Labels
Component: Console Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: ready for confirmation Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: done Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Severity: S2 Major restrictions or short-term circumventions are required until a fix is available.

Comments

@luckyraul
Copy link

luckyraul commented Jul 2, 2019

Preconditions (*)

Then you deploy a next release you still have queue consumers running from the old releases. And where is no way to stop them

Steps to reproduce (*)

run cron on the first release
release new version
run cron on the second release
release new version
run cron on the third release

Expected result (*)

only new process running

releases/745/bin/magento queue:consumers:start product_action_attribute.website.update

Actual result (*)

still running old processes

releases/743/bin/magento queue:consumers:start product_action_attribute.website.update
releases/744/bin/magento queue:consumers:start product_action_attribute.website.update
releases/745/bin/magento queue:consumers:start product_action_attribute.website.update
@m2-assistant
Copy link

m2-assistant bot commented Jul 2, 2019

Hi @luckyraul. Thank you for your report.
To help us process this issue please make sure that you provided the following information:

  • Summary of the issue
  • Information on your environment
  • Steps to reproduce
  • Expected and actual results

Please make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, please, add a comment to the issue:

@magento give me 2.3-develop instance - upcoming 2.3.x release

For more details, please, review the Magento Contributor Assistant documentation.

@luckyraul do you confirm that you were able to reproduce the issue on vanilla Magento instance following steps to reproduce?

  • yes
  • no

@magento-engcom-team magento-engcom-team added Issue: Format is not valid Gate 1 Failed. Automatic verification of issue format is failed Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed and removed Issue: Format is not valid Gate 1 Failed. Automatic verification of issue format is failed labels Jul 2, 2019
@engcom-Charlie
Copy link
Contributor

Hello @luckyraul ! Can you provide more steps how to reproduce it ?

@engcom-Charlie engcom-Charlie self-assigned this Jul 2, 2019
@luckyraul
Copy link
Author

luckyraul commented Jul 2, 2019

I don't do anything except running cron before and after release. Updated issue description

@slackerzz
Copy link
Member

You need to share the var directory between releases

@luckyraul
Copy link
Author

you cannot share var

@slackerzz
Copy link
Member

@luckyraul why not? I'm using capistrano with https://github.com/davidalger/capistrano-magento2 to deploy and i assure you that you can:
immagine

@hostep
Copy link
Contributor

hostep commented Jul 4, 2019

@slackerzz: it's not really recommended to share the full var directory in between releases. Especially if you use file based caching for example. You want the cache to not be shared in between releases.

The examples in the repo's README you reference also shares only specific individual files or directories within the var directory:

set :linked_files, [
  'app/etc/env.php',
  'app/etc/config.local.php',
  'var/.setup_cronjob_status',
  'var/.update_cronjob_status'
]

set :linked_dirs, [
  'pub/media',
  'pub/sitemaps',
  'var/backups', 
  'var/composer_home', 
  'var/importexport', 
  'var/import_history', 
  'var/log',
  'var/session', 
  'var/tmp'
]

Can you tell us a bit more about those .pid files in the var directory btw, how do those work exactly? (I'm assuming you reference to those in your statement about sharing the var directory?)

I'm also running against the problem of @luckyraul with our capistrano-style deployments and I'm wondering if those .pid files can help resolve that or not, but I don't fully understand how they work yet?

Thanks!

@slackerzz
Copy link
Member

Hi @hostep, you're right, it's not ideal but i'm not using files for caching, i use Redis.
However I created a systemd service to run my consumer because at the time of Magento 2.1 there wasn't an option to run it by cron.

When the consumer is started for the first time it creates a pid file in the magento var directory.
Every subsequent cron execution checks for the pid file, if this exists the consumer is not started again. Take a look at https://github.com/magento/magento2/blob/2.3-develop/app/code/Magento/MessageQueue/Model/Cron/ConsumersRunner.php

When you deploy a new release and with a non shared var directory the pid file is not there and so magento creates another process.

Maybe it could be better to create a var/pid directory and share only this between releases.

Another solution can be to stop consumers before deploying a new release. For example on Magento Cloud Commerce all cron processes are killed during a deploy.

@luckyraul
Copy link
Author

luckyraul commented Jul 4, 2019

@slackerzz
Firstly there is no command to stop them.
Secondly it is not a var/pid.
Lastly, If we keep pid files shared it will not start a new one if there is old running? Am I correct? This creates a new problem because old code will be still running

@hostep
Copy link
Contributor

hostep commented Jul 5, 2019

Out of curiosity: is it safe to kill these queue:consumers:start processes? What if some message is just being handled by that process and in the middle of it you kill that process, what happens then? (sorry for asking stupid questions, I don't quite understand yet how the whole message queueing works in Magento and how stable it is).

Also: what we are seeing, is that no new queue:consumers:start processes get started up, even if the PID files are gone and when the old queue:consumers:start processes are still running while pointing to a release which no longer exists and also pointing to PID files which no longer exists. These processes are then still using code from the old release loaded in memory I guess.
Update: this statement doesn't seem to be correct, still trying to figure it out, but weird things happen after deploys on our server.

I also noticed Magento came with something called a poison pill feature in Magento 2.3.2, but I don't know if that is relevant here and if we can use that somehow to tell the queues to stop and restart from a new release?

@slackerzz
Copy link
Member

@luckyraul

Firstly there is no command to stop them.

kill will do the work

Secondly it is not a var/pid.

I was suggesting that we can think to submit a pull request to change the pid file location to var/pid to not share all the var directory between release. I know that there is no var/pid directory.

Lastly, If we keep pid files shared it will not start a new one if there is old running? Am I correct? This creates a new problem because old code will be still running

You're right

@hostep i didn't know about the PoisonPill, maybe this can resolve the last point of @luckyraul

My "personal" solution was to add

'cron_consumers_runner' => [
        'cron_run' => false
    ]

to app/etc/env.php to disable automatic consumer start by cron.
Then I create a systemd service to manage my consumer. In this way i can service myconsumer start, service myconsumer restart etc

@arnoudhgz
Copy link
Contributor

arnoudhgz commented Jul 8, 2019

I can also easily reproduce it by just manually creating an export job with a lot of products. After that all other cron tasks get status missed (payments, indexers etc).

So this is not only happening when doing a new release.

Small exports are fine though....

@alaa-almaliki
Copy link

I just came across the issue today and seen tasks from old releases keep running in memory. This has caused a memory issue and dev server has went irresponsive. Killed all the queue tasks and disabled them to avoid memory issue.

'cron_consumers_runner' => [
        'cron_run' => false,
        'max_messages' => 20000,
        'consumers' => [],
 ],

The memory is reclaimed now and no more issues post deployments.

@hostep
Copy link
Contributor

hostep commented Jul 24, 2019

Be aware that certain tasks being executed in the backend will no longer work when you disable all the consumers. These are the things listed under Merchant tool enhancements in the 2.3.2 release notes.

Anyway, I've created a feature request for an easy-to-use action to stop running consumer processes. We could then use this during a deployment. If somebody has some ideas, feel free to contribute: magento/community-features#181

Until this feature is implemented, I'm currently solving it by reading the var/*.pid files from the previous release, and killing all those pid's using the kill command. Not sure how safe it is to do it like that, but that resolved our deployment problems for now.

@engcom-Charlie engcom-Charlie removed their assignment Aug 29, 2019
@giacmir
Copy link
Member

giacmir commented Oct 28, 2019

There are two problems with the current PoisonPill implementation:

  1. the current poisonpill version is invalidated only when config is saved or a new website/store/store-view is added,it could happen that a new deploy doesn't trigger one of these two events.
  2. the consumer check for the poisonpill only when it receives a message, so if your messages aren't very frequent (let say in a test environment) you may have several old consumers waiting for the message that will kill them. All those strand processes can easily saturate a test machine.

@hostep
Copy link
Contributor

hostep commented Oct 28, 2019

@giacmir indeed!

See https://github.com/magento/architecture/pull/232/files?short_path=81c5aa0#diff-81c5aa0b55a519b20c0ffd8b3f57b21b for a proposal for some new functionality in Magento for how to handle those. Especially "Problem 2" is relevant here.

@Zyles
Copy link

Zyles commented Feb 5, 2020

Same problem. Our server died because old consumers were running and spiked CPU load after we moved to a new release.

Is consumers supposed to be running 24/7? "ps aux | grep consumers" have 4 consumers running at all times.

@hostep
Copy link
Contributor

hostep commented Feb 5, 2020

@Zyles: yes, that's how it was designed, if you don't need the functionality provided by those consumers, you can disable all or some of them, see docs.

Magento 2.3.4 came with a new option consumers-wait-for-messages, which will kill the consumer process just after it got started up and noticed no messages were in the queue. This means that every minute a new process starts up, takes a lot of memory, looks for pending message, if any found processes them, and then kills itself. This feels like too much overhead to me.

Therefore I proposed a different idea for doing the check on pending messages before the processes start up, see here, option only-spawn-when-message-available. Work on this hasn't started as far as I'm aware, but I'm not in the loop, so maybe it has started already, who knows ... :)

@amenk
Copy link
Contributor

amenk commented Feb 24, 2020

Then I create a systemd service to manage my consumer. In this way i can service myconsumer > start, service myconsumer restart etc

The solution by @slackerzz sounds very reasonable to me. Those systemd services should be in the devdocs :)

Starting the consumers via cron also seems to have the side-effect, that the first cron-run never returns

swnsma added a commit to swnsma/magento2-1 that referenced this issue Feb 2, 2021
Add integration tests coverage.
swnsma added a commit to swnsma/magento2-1 that referenced this issue Feb 7, 2021
@engcom-Oscar engcom-Oscar added Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed labels Mar 18, 2021
@magento-engcom-team
Copy link
Contributor

✅ Confirmed by @engcom-Oscar
Thank you for verifying the issue. Based on the provided information internal tickets MC-41429 were created

Issue Available: @engcom-Oscar, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.

@t-heuser
Copy link

Any news on this? We're also experiencing this issue. Pretty annoying to have no magento way to stop the consumers.

@hostep
Copy link
Contributor

hostep commented Mar 30, 2022

@oneserv-heuser: there is a PR open for this: #31495

However, as mentioned by @giacmir above:

the consumer check for the poisonpill only when it receives a message, so if your messages aren't very frequent (let say in a test environment) you may have several old consumers waiting for the message that will kill them. All those strand processes can easily saturate a test machine.

So even after you call bin/magento queue:consumers:restart from the PR mentioned above, the consumer process will not stop if no new messages come in. Only when one new message comes in, it will stop itself and than the cron will start it up again next time it runs.

Due to this, we kill all consumers during each deploy, as explained a bit over here.

@Kannakiraj123
Copy link
Contributor

@magento I am working on this

@Kannakiraj123
Copy link
Contributor

Kannakiraj123 commented Aug 29, 2023

@luckyraul @swnsma can you please provide reproduce the issue in local setup with docker?

I followed below steps.but I cannot reproduce

1.Run "bin/magento queue:consumers:start product_action_attribute.update"
2.I gave mass update from product admin grid.
3.its give message to queue and started to running state.
4.Run command in other terminal "bin/magento queue:consumers:restart".Its doesn't stop the consumer.but its changes "queue_poision_pill" version release.
5.Still queue management consumer is not stopped.
6.Again gave mass update from product admin grid.
7.its give message to queue and started to running state.

Screenshot 2023-08-29 at 3 18 26 PM

Note:when I close the terminal which is running the command "queue:consumers:start".That time consumer stopped.I can see consumer removed from queue_management.but message is not remove from queue.

@swnsma
Copy link
Contributor

swnsma commented Aug 29, 2023

Hello @Kannakiraj123,

This issue was solved in #31495 by introducing queue:consumers:restart command to change poison pill version.
Seems issue with automation (as PR was merged and already included in released Magento version).
Ticket should be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Console Issue: Confirmed Gate 3 Passed. Manual verification of the issue completed. Issue is confirmed Issue: ready for confirmation Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: done Reproduced on 2.4.x The issue has been reproduced on latest 2.4-develop branch Severity: S2 Major restrictions or short-term circumventions are required until a fix is available.
Projects
Development

Successfully merging a pull request may close this issue.