Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt occasionally lose command when publish a large command #46553

Closed
pengyao opened this issue Mar 15, 2018 · 7 comments
Closed

Salt occasionally lose command when publish a large command #46553

pengyao opened this issue Mar 15, 2018 · 7 comments
Labels
Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged
Milestone

Comments

@pengyao
Copy link
Contributor

pengyao commented Mar 15, 2018

Description of Issue/Question

If command or target is big, salt occasionally lose the payload in publish_pull socket

Versions Report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

Salt Version:
           Salt: 2017.7.4
 
Dependency Versions:
           cffi: 1.10.0
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: 2.18
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.13 (default, Jul 12 2017, 17:32:34)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.5.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
 
System Versions:
           dist: centos 6.5 Final
         locale: UTF-8
        machine: x86_64
        release: 2.6.32-431.el6.x86_64
         system: Linux
        version: CentOS 6.5 Final
@pengyao
Copy link
Contributor Author

pengyao commented Mar 15, 2018

pub_sock.send(self.serial.dumps(int_payload))
pub_sock.close()
context.term()

pub_sock will immediately destroy after payload send, master pull socket not received the payload. I think the reason is: zeromq/libzmq#1922

@pengyao pengyao changed the title Salt occasionally loss command when publish a large command Salt occasionally lose command when publish a large command Mar 15, 2018
@garethgreenaway garethgreenaway added this to the Blocked milestone Mar 15, 2018
@garethgreenaway garethgreenaway added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Mar 15, 2018
@garethgreenaway
Copy link
Contributor

@pengyao Thanks for the report. So we have a better change of reproducing this, can you provide more information about the target and command that you're sending along is?

@pengyao
Copy link
Contributor Author

pengyao commented Mar 16, 2018

My test case:

  • A new salt master, version: 2017.7.4, OS: CentOS 6. installed from salt repo
  • Config zmq_filtering to True for generate large target:
echo "zmq_filtering: True" >> /etc/salt/master
  • Add debug statement for publish_pull sock (/usr/lib/python2.7/site-packages/salt/transport/zeromq.py):
# After line 768
log.info('Get payload from pull sock')
# After line 855
log.info('Send payload to pull sock')
  • Restart salt-master:
service salt-master restart
  • Create 20000 pseudo minions:
for each in {1..20000}; do touch /etc/salt/pki/master/minions/minion-${each};done
  • Truncate salt master log file:
> /var/log/salt/master
  • Send 100 times test.ping:
for each in {1..200};do salt '*' test.ping --async;done
  • Check Send payload to pull sock log times:
grep "Send payload to pull sock" /var/log/salt/master |wc -l
  • Check Get payload from pull sock log times:
grep "Get payload from pull sock" /var/log/salt/master |wc -l

In my test case, Send payload to pull sock log times is 200, Get payload from pull sock log times is 191. In other words, 9 payloads have been lost

@garethgreenaway
Copy link
Contributor

@pengyao Can you try adding max_event_size to your Salt master configuration:
https://docs.saltstack.com/en/latest/ref/configuration/master.html#max-event-size

@pengyao
Copy link
Contributor Author

pengyao commented Mar 20, 2018

@garethgreenaway I have added max_event_size: 512000 to master configuration, and test again. Get payload from pull sock log times is 192. The problem is not resolved.

@pengyao
Copy link
Contributor Author

pengyao commented Mar 28, 2018

@garethgreenaway I updated zeromq to 4.0.8, and test the case again. The problem is resolved.

In 4.0.8 ChangeLog:

* Fixed #919 - ZMQ_LINGER (related to #1877)

zeromq/libzmq#1877 have been repaired the problem

@pengyao
Copy link
Contributor Author

pengyao commented Jun 28, 2019

#50463 fixed it

@pengyao pengyao closed this as completed Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged
Projects
None yet
Development

No branches or pull requests

2 participants