Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowness in SNMPInterfaceCollector skips other collector runs, making it Threaded does not work. #263

Closed
rtoma opened this issue Jan 31, 2013 · 6 comments
Labels

Comments

@rtoma
Copy link
Contributor

rtoma commented Jan 31, 2013

Hi,
SNMP devices can be very slow. If you run with the default collector set and enable the SNMPInterfaceCollector, I have noticed scheduled runs for the default collectors get skipped.

I assume because the SNMPInterfaceCollector is working and blocking the thread in which the other collectors also run.

I have tried including "method=Threaded" in the SNMPInterfaceCollector.conf, but that gave me errors like:

[2013-01-31 14:39:08,505] [Thread-3] ERROR DURING TASK EXECUTION MIB subtree (1, 3, 6, 1, 6, 3, 10, 2, 1, 4, 0) already registered at MibScalar((1, 3, 6, 1, 6, 3, 10, 2, 1, 4), Integer32())
 Traceback (most recent call last):
  File "/home/rtoma/Diamond/src/diamond/scheduler.py", line 484, in threadedcall
    self.execute()
  File "/home/rtoma/Diamond/src/diamond/scheduler.py", line 343, in execute
    self.action(*self.args, **self.kw)
  File "./src/collectors/snmpinterface/snmpinterface.py", line 128, in collect_snmp
    ifIndexData = self.walk(ifIndexOid, host, port, community)
  File "./src/collectors/snmp/snmp.py", line 144, in walk
    oid)
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/rfc3413/oneliner/cmdgen.py", line 484, in nextCmd
    contextEngineId, contextName
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/rfc3413/oneliner/cmdgen.py", line 290, in nextCmd
    authData, transportTarget
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/rfc3413/oneliner/cmdgen.py", line 54, in cfgCmdGen
    authData.tag
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/config.py", line 75, in addV1System
    ((snmpCommunityEntry.name + (8,) + tblIdx, 'destroy'),)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/instrum.py", line 245, in writeVars
    return self.flipFlopFsm(self.fsmWriteVar, vars, acInfo)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/instrum.py", line 183, in flipFlopFsm
    self.__indexMib()
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/instrum.py", line 139, in __indexMib
    scalars[inst.typeName].registerSubtrees(inst)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/mibs/SNMPv2-SMI.py", line 243, in registerSubtrees
    'MIB subtree %s already registered at %s' %  (subTree.name, self)
SmiError: MIB subtree (1, 3, 6, 1, 6, 3, 10, 2, 1, 4, 0) already registered at MibScalar((1, 3, 6, 1, 6, 3, 10, 2, 1, 4), Integer32())

... and ...

[2013-01-31 14:43:36,072] [Thread-2] ERROR DURING TASK EXECUTION 'NoneType' object has no attribute 'clone'
 Traceback (most recent call last):
  File "/home/rtoma/Diamond/src/diamond/scheduler.py", line 484, in threadedcall
    self.execute()
  File "/home/rtoma/Diamond/src/diamond/scheduler.py", line 343, in execute
    self.action(*self.args, **self.kw)
  File "./src/collectors/snmpinterface/snmpinterface.py", line 128, in collect_snmp
    ifIndexData = self.walk(ifIndexOid, host, port, community)
  File "./src/collectors/snmp/snmp.py", line 144, in walk
    oid)
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/rfc3413/oneliner/cmdgen.py", line 484, in nextCmd
    contextEngineId, contextName
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/rfc3413/oneliner/cmdgen.py", line 290, in nextCmd
    authData, transportTarget
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/rfc3413/oneliner/cmdgen.py", line 54, in cfgCmdGen
    authData.tag
  File "/usr/lib/python2.6/site-packages/pysnmp/entity/config.py", line 84, in addV1System
    (snmpCommunityEntry.name + (7,) + tblIdx, 'nonVolatile'))
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/instrum.py", line 245, in writeVars
    return self.flipFlopFsm(self.fsmWriteVar, vars, acInfo)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/instrum.py", line 216, in flipFlopFsm
    rval = f(tuple(name), val, idx, acInfo)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/mibs/SNMPv2-SMI.py", line 334, in readGet
    return node.readGet(name, val, idx, acInfo)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/mibs/SNMPv2-SMI.py", line 334, in readGet
    return node.readGet(name, val, idx, acInfo)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/mibs/SNMPv2-SMI.py", line 477, in readGet
    return node.readGet(name, val, idx, acInfo)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/mibs/SNMPv2-SMI.py", line 581, in readGet
    return self.name, self.getValue(name, idx)
  File "/usr/lib/python2.6/site-packages/pysnmp/smi/mibs/SNMPv2-SMI.py", line 526, in getValue
    return self.syntax.clone()
AttributeError: 'NoneType' object has no attribute 'clone'

Since I get these errors after enabling threading, I guess the SNMPInterfaceCollector is not thread safe. "pysnmp" should be threadsafe (http://stackoverflow.com/a/10384203) "as long as you use a dedicated CommandGenerator per thread".

Any thoughts?

@kormoc
Copy link
Contributor

kormoc commented Feb 1, 2013

Because we're using derivative to convert the counters into metrics, we can't use a threaded method

@jgoldschrafe
Copy link
Contributor

What is it about the current implementation of derivatives that makes them non-threadsafe? I'm very interested in using Diamond for SNMP, since collectd's support is painfully rough around the edges, but we also have some fairly long-running stats collections that need to be ironed out.

@kormoc
Copy link
Contributor

kormoc commented May 16, 2013

The threads are short lived. They get destroyed after the scheduled collector run and then spawned anew on the next one. Any modifications to the local state get destroyed. We would need to synchronize the derivative deltas back to the main thread before we destroy the running thread. It's perfectly doable, just not done.

@TinLe
Copy link
Contributor

TinLe commented Jun 6, 2013

I run into similar problem when I was trying to monitor a pair of Cat6Ke's. Timeouts and skipped interfaces.

Part of the problem is that snmp is walking through the interfaces using individual snmpget. I think it's better to do snmpbulkget, and then walk through that instead.

@kormoc
Copy link
Contributor

kormoc commented Jun 10, 2013

I agree, but it's a little complex. I'd happily accept a pull request that implemented it.

@GreggBzz
Copy link

Hopefully, this will help some folks. I encounter this same issue and re-wrote the collector itself to be multi-threaded. It likely needs work since I'm no python developer but you might try it out:
https://github.com/GreggBzz/snmp-interface-poll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants