Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS: timeout on resolving address on all requests after hours of operation #51

Closed
berglh opened this issue Apr 25, 2019 · 22 comments · Fixed by #52
Closed

DNS: timeout on resolving address on all requests after hours of operation #51

berglh opened this issue Apr 25, 2019 · 22 comments · Fixed by #52

Comments

@berglh
Copy link

berglh commented Apr 25, 2019

After upgrading to 6.7.1 version of logstash, the DNS filter works fine for a while and then suddenly starts timing out continuously, thus slowing down pipeline performance dramatically.

[2019-04-23T22:27:49,497][WARN ][logstash.filters.dns ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.91.99"}

Troubleshooting Steps

  • Remove failed & hit caches form config
  • Downgraded plugin to 3.0.11
  • Returned to previously built Logstash 6.5.1 container

The behaviour continued in the first two steps on LS 6.7.1. The behaviour occured straight away after removing the cache size and ttl settings.

Things seem stable again back at LS 6.5.1.

Investigation

After leaving a comment on #40, @colinsurprenant replied with the following:

@berglh Thanks for the followup.

I believe I found the source of the problem. I will need to validate tomorrow but there seems to be a leak (a request id cache fills up which lead to an infinite loop that is aborted by the timeout) in the new version of the resolver code that was updated starting with 6.7. For now, as you noticed, the only workaround is to downgrade to 6.6 (6.6.2 as of now) and just downgrading the dns filter will not help.

Environment

I'm using the official Logstash containers.
I'm use a caching-only BIND server to service the requests of Logstash, I've configured the name server at the Docker container level.
The BIND server is always able to correctly resolve the requests correctly if I test against the IPs or host names detailed in the Logstash warning events.

Details

  • Version: Logstash 6.7.1, Plugin Version 3.0.12
  • Operating System: Oracle Enterprise Linux 7.3 (Official Docker Container)
  • Config:
2-general/400000-ad-smb.conf:      dns {
2-general/400000-ad-smb.conf-        resolve => ["[@metadata][destination][address]"]
2-general/400000-ad-smb.conf-        action => "replace"
2-general/400000-ad-smb.conf-        failed_cache_size => 1000
2-general/400000-ad-smb.conf-        failed_cache_ttl => 60
2-general/400000-ad-smb.conf-        hit_cache_size => 10000
2-general/400000-ad-smb.conf-        hit_cache_ttl => 300
2-general/400000-ad-smb.conf-      }
--
2-general/400000-ad-smb.conf:            dns {
2-general/400000-ad-smb.conf-              reverse => ["[@metadata][ip]"]
2-general/400000-ad-smb.conf-              action => "replace"
2-general/400000-ad-smb.conf-              failed_cache_size => 1000
2-general/400000-ad-smb.conf-              failed_cache_ttl => 60
2-general/400000-ad-smb.conf-              hit_cache_size => 10000
2-general/400000-ad-smb.conf-              hit_cache_ttl => 300
2-general/400000-ad-smb.conf-            }
@colinsurprenant
Copy link
Contributor

colinsurprenant commented Apr 25, 2019

Thanks a lot for investigating and creating the issue @berglh.

To recap, this problem affects all versions starting at and after v6.7, including v7 and the only workaround at this point is to downgrade to latest v6.6.x (6.6.2 as of now).

This problem is in a library included in logstash core but we should be able to apply a patch from the dns filter plugin, similar to what we did in #45.

@colinsurprenant
Copy link
Contributor

@berglh are you using the nameserver option of the dns filter by any chance? if so, do you have multiple servers configured?

@colinsurprenant
Copy link
Contributor

@berglh

if you are not using the nameserver option then also verify if you have multiple servers configured in your /etc/resolv.conf.

IF the answer is yes to either:

  • using the nameserver option with multiple hosts configured
  • not using the nameserver option but having multiple servers configured in /etc/resolv.conf

then as a tentative temporary workaround you can try to reduce to a SINGLE server in either the nameserver option or in your /etc/resolv.conf and see if it solves.

@colinsurprenant
Copy link
Contributor

I have an incoming fix in #52

@berglh
Copy link
Author

berglh commented Apr 25, 2019

@colinsurprenant Yes I launch the Docker container with the --dns option specified for multiple targets, so the resolver can fail-over to our local resolving name servers:

docker run --dns=10.1.1.1 --dns=1.1.1.1 --dns=1.0.0.1 --name %p reg.devops.our.domain.edu.au/logstash-processor -r --log.level warn -b125`

The net result is that multiple name-servers will be configured in /etc/resolve.conf which looks like this:

[root@logstash-container /]# cat /etc/resolv.conf 
search ourdomain.edu.au
nameserver 10.1.1.1
nameserver 1.1.1.1
nameserver 1.0.0.1
options timeout:1

I'm just pushing the 6.7.1 container tag and I'll try it with a single DNS server specified on the container.

@berglh
Copy link
Author

berglh commented Apr 25, 2019

@colinsurprenant I have now just the BIND caching name server defined in the Docker container explicitly and it has the single nameserver in resolv.conf.

[root@logstash-container /]# cat /etc/resolv.conf 
search ourdomain.edu.au
nameserver 10.1.1.1
options timeout:1

I will need to wait considerable time to validate it's working better than before.

I'd just like to add that the filter also seems to timeout on the first few look-ups when the logstash pipeline is first started. I'm running 5 instances of logstash on five separate servers, one out of the five instances had four timeouts, however, these timeouts when manually searched against the one configured server resolve correctly.

I'm not sure on the mechanisms logstash uses to determine all plugins are "ready", or if they have just been "started" per se. It feels like there is some transient condition on start-up where the timeout latency for resolv.rb is met, but I wonder if it's general contention in the start-up phase as our logstash instances are pretty big at 3498 lines and a pipeline config that looks like this:

# inputs acts as a router to specific processing pipelines
# reducing the workers on a per pipeline bases greatly reduced the
# number of each plugin instance from running and improves start-up time
- pipeline.id: 1-inputs
  pipeline.workers: 3
  path.config: "/usr/share/logstash/pipelines/1-inputs"
  config.reload.automatic: true
- pipeline.id: 2-general
  pipeline.workers: 2
  path.config: "/usr/share/logstash/pipelines/2-general"
- pipeline.id: 3-ldap
  pipeline.workers: 4
  path.config: "/usr/share/logstash/pipelines/3-ldap"
- pipeline.id: 4-dhcp-dns
  pipeline.workers: 3                               
  path.config: "/usr/share/logstash/pipelines/4-dhcp-dns"
- pipeline.id: 5-radius
  pipeline.workers: 2                               
  path.config: "/usr/share/logstash/pipelines/5-radius"
- pipeline.id: 6-syslog
  pipeline.workers: 4                               
  path.config: "/usr/share/logstash/pipelines/6-syslog"
# there are generic post-processing filters in the output pipeline
- pipeline.id: 7-outputs
  pipeline.workers: 15                               
  config.reload.automatic: true
  path.config: "/usr/share/logstash/pipelines/7-outputs"

image

@berglh
Copy link
Author

berglh commented Apr 25, 2019

The first hour seems to be going pretty well:
image

@colinsurprenant
Copy link
Contributor

published dns filter version 3.0.13 with patch - let mw know how that works for you!

@berglh
Copy link
Author

berglh commented Apr 30, 2019

@colinsurprenant I appreciate you getting this addressed so quickly. I'll build and test. In the meantime I just had a pipeline failure last night, noticing that the old version started to accumulate more and more DNS lookup timeouts, but I think this is residual from the original issue I commented on. I'll see how the new resolv.rb works out for me :)

@berglh
Copy link
Author

berglh commented Apr 30, 2019

@colinsurprenant

I just had a similar failure, but I'm not entirely sure it's down to the DNS filter. I started the containers at about 12:15. I then had my watchdog service notice the event rate drop for four instances at about the same time.

image

Apr 30 13:46:42 thor logstash-processor: [2019-04-30T03:46:42,083][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.74.197"}
Apr 30 13:47:38 thor logstash-processor: [2019-04-30T03:47:38,175][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.96.76"}
Apr 30 13:50:03 thor logstash-processor: [2019-04-30T03:50:03,214][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.153.150.131"}
Apr 30 13:55:44 thor logstash-processor: [2019-04-30T03:55:44,242][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.83.2.46"}
Apr 30 13:55:44 thor logstash-processor: [2019-04-30T03:55:44,244][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.168.116"}
Apr 30 13:55:51 thor logstash-processor: [2019-04-30T03:55:51,749][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.51.35.108"}
Apr 30 14:00:33 thor logstash-processor: [2019-04-30T04:00:33,012][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.72.220"}
Apr 30 14:04:12 thor logstash-processor: [2019-04-30T04:04:12,139][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.138.108.57"}
Apr 30 14:15:40 thor logstash-processor: [2019-04-30T04:15:40,440][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.184.42.35"}
Apr 30 14:15:40 thor logstash-processor: [2019-04-30T04:15:40,445][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.153.157.30"}
Apr 30 14:17:47 thor logstash-processor: [2019-04-30T04:17:47,018][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.33.3.86"}
Apr 30 14:17:47 thor logstash-processor: [2019-04-30T04:17:47,018][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"192.168.67.223"}
Apr 30 14:24:46 thor logstash-processor: [2019-04-30T04:24:46,022][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.55.0.81"}
Apr 30 14:30:38 thor logstash-processor: [2019-04-30T04:30:38,862][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"172.23.50.197"}
Apr 30 14:30:38 thor logstash-processor: [2019-04-30T04:30:38,865][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"172.23.50.197"}
Apr 30 14:35:14 thor logstash-processor: [2019-04-30T04:35:14,712][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.72.187"}
Apr 30 14:35:14 thor logstash-processor: [2019-04-30T04:35:14,714][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.43.184.113"}
Apr 30 14:44:42 thor logstash-processor: [2019-04-30T04:44:42,614][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.73.16"}
Apr 30 14:44:42 thor logstash-processor: [2019-04-30T04:44:42,618][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.44.146"}
Apr 30 14:45:03 thor logstash-processor: [2019-04-30T04:45:03,168][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.105.75"}
Apr 30 14:47:31 thor logstash-processor: [2019-04-30T04:47:31,179][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.14.114"}
Apr 30 14:49:27 thor logstash-processor: [2019-04-30T04:49:27,316][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"192.168.246.96"}
Apr 30 14:52:47 thor logstash-processor: [2019-04-30T04:52:47,713][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.152.140.35"}
Apr 30 14:57:04 thor logstash-processor: [2019-04-30T04:57:04,896][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.240.138.78"}
Apr 30 15:07:46 thor logstash-processor: [2019-04-30T05:07:46,852][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.20.144"}
Apr 30 15:07:46 thor logstash-processor: [2019-04-30T05:07:46,873][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.20.144"}
Apr 30 15:25:25 thor logstash-processor: [2019-04-30T05:25:25,749][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.52.2.87"}
Apr 30 15:25:25 thor logstash-processor: [2019-04-30T05:25:25,749][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.96.38"}
Apr 30 15:25:57 thor logstash-processor: [2019-04-30T05:25:57,163][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.113.201"}
Apr 30 15:25:57 thor logstash-processor: [2019-04-30T05:25:57,164][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.113.201"}
Apr 30 15:50:18 thor logstash-processor: [2019-04-30T05:50:18,079][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"192.168.29.55"}
Apr 30 16:10:27 thor logstash-processor: [2019-04-30T06:10:27,733][WARN ][logstash.runner          ] SIGTERM received. Shutting down.

On closer inspection, even though I built the container with the Dockerfile commands:

# Install resolv.rb replaced version of DNS filter
RUN /usr/share/logstash/bin/logstash-plugin remove logstash-filter-dns
RUN /usr/share/logstash/bin/logstash-plugin install --version 3.0.13 logstash-filter-dns

Looking at the plugin version in the logstash API, it's reporting the old version still:
"logstash-filter-dns","version":"3.0.12"

It'll try to track this down, but this is what I've noticed so far.

@berglh
Copy link
Author

berglh commented Apr 30, 2019

root@thor:~ docker run -ti --rm our.registry:/logstash-processor:latest /usr/share/logstash/bin/logstash-plugin list --verbose                                                                                                                                          
logstash-codec-cef (5.0.6)
logstash-codec-collectd (3.0.8)
logstash-codec-dots (3.0.6)
logstash-codec-edn (3.0.6)
logstash-codec-edn_lines (3.0.6)
logstash-codec-es_bulk (3.0.7)
logstash-codec-fluent (3.2.0)
logstash-codec-graphite (3.0.5)
logstash-codec-json (3.0.5)
logstash-codec-json_lines (3.0.6)
logstash-codec-line (3.0.8)
logstash-codec-msgpack (3.0.7)
logstash-codec-multiline (3.0.10)
logstash-codec-netflow (3.14.1)
logstash-codec-plain (3.0.6)
logstash-codec-rubydebug (3.0.6)
logstash-filter-aggregate (2.9.0)
logstash-filter-anonymize (3.0.6)
logstash-filter-cidr (3.1.2)
logstash-filter-clone (3.0.6)
logstash-filter-csv (3.0.8)
logstash-filter-date (3.1.9)
logstash-filter-de_dot (1.0.4)
logstash-filter-dissect (1.2.0)
logstash-filter-dns (3.0.13)
logstash-filter-drop (3.0.5)
logstash-filter-elasticsearch (3.6.0)
logstash-filter-fingerprint (3.2.1)

Seems to indicate that it was already in-fact installed, yet reports version 3.0.12 only via the plugin API 🤷‍♂️

@colinsurprenant
Copy link
Contributor

@berglh This is very weird. Is this still with v6.7.1?

@berglh
Copy link
Author

berglh commented Apr 30, 2019

@colinsurprenant Yes, v6.7.1. I was thinking about setting DEBUG or TRACE logging on the plugin, but I can't actually see the plugin here in the loggers list either:

{
  "host" : "redact",
  "version" : "6.7.1",
  "http_address" : "0.0.0.0:9600",
  "id" : "e15817b0-53f0-4ed1-9032-9bda31120d6c",
  "name" : "redact",
  "loggers" : {
    "io.netty.bootstrap.ServerBootstrap" : "WARN",
    "io.netty.buffer.AbstractByteBuf" : "WARN",
    "io.netty.buffer.ByteBufUtil" : "WARN",
    "io.netty.buffer.PoolThreadCache" : "WARN",
    "io.netty.buffer.PooledByteBufAllocator" : "WARN",
    "io.netty.channel.AbstractChannel" : "WARN",
    "io.netty.channel.AbstractChannelHandlerContext" : "WARN",
    "io.netty.channel.AbstractCoalescingBufferQueue" : "WARN",
    "io.netty.channel.ChannelInitializer" : "WARN",
    "io.netty.channel.ChannelOutboundBuffer" : "WARN",
    "io.netty.channel.CombinedChannelDuplexHandler" : "WARN",
    "io.netty.channel.DefaultChannelId" : "WARN",
    "io.netty.channel.DefaultChannelPipeline" : "WARN",
    "io.netty.channel.DelegatingChannelPromiseNotifier" : "WARN",
    "io.netty.channel.MultithreadEventLoopGroup" : "WARN",
    "io.netty.channel.nio.AbstractNioChannel" : "WARN",
    "io.netty.channel.nio.NioEventLoop" : "WARN",
    "io.netty.channel.socket.nio.NioServerSocketChannel" : "WARN",
    "io.netty.channel.socket.nio.NioSocketChannel" : "WARN",
    "io.netty.handler.codec.http.HttpObjectAggregator" : "WARN",
    "io.netty.handler.ssl.CipherSuiteConverter" : "WARN",
    "io.netty.handler.ssl.OpenSsl" : "WARN",
    "io.netty.handler.ssl.PemReader" : "WARN",
    "io.netty.handler.ssl.ReferenceCountedOpenSslContext" : "WARN",
    "io.netty.handler.ssl.ReferenceCountedOpenSslEngine" : "WARN",
    "io.netty.handler.ssl.ReferenceCountedOpenSslServerContext" : "WARN",
    "io.netty.handler.ssl.SslHandler" : "WARN",
    "io.netty.handler.ssl.util.SelfSignedCertificate" : "WARN",
    "io.netty.util.NetUtil" : "WARN",
    "io.netty.util.Recycler" : "WARN",
    "io.netty.util.ReferenceCountUtil" : "WARN",
    "io.netty.util.ResourceLeakDetector" : "WARN",
    "io.netty.util.ResourceLeakDetectorFactory" : "WARN",
    "io.netty.util.concurrent.AbstractEventExecutor" : "WARN",
    "io.netty.util.concurrent.DefaultPromise" : "WARN",
    "io.netty.util.concurrent.DefaultPromise.rejectedExecution" : "WARN",
    "io.netty.util.concurrent.GlobalEventExecutor" : "WARN",
    "io.netty.util.concurrent.PromiseNotifier" : "WARN",
    "io.netty.util.concurrent.SingleThreadEventExecutor" : "WARN",
    "io.netty.util.internal.CleanerJava6" : "WARN",
    "io.netty.util.internal.InternalThreadLocalMap" : "WARN",
    "io.netty.util.internal.MacAddressUtil" : "WARN",
    "io.netty.util.internal.NativeLibraryLoader" : "WARN",
    "io.netty.util.internal.PlatformDependent" : "WARN",
    "io.netty.util.internal.PlatformDependent0" : "WARN",
    "io.netty.util.internal.SystemPropertyUtil" : "WARN",
    "io.netty.util.internal.logging.InternalLoggerFactory" : "WARN",
    "logstash.agent" : "WARN",
    "logstash.api.service" : "WARN",
    "logstash.codecs.json" : "WARN",
    "logstash.codecs.jsonlines" : "WARN",
    "logstash.codecs.plain" : "WARN",
    "logstash.config.source.local.configpathloader" : "WARN",
    "logstash.config.source.multilocal" : "WARN",
    "logstash.config.sourceloader" : "WARN",
    "logstash.filters.grok" : "WARN",
    "logstash.filters.metrics" : "WARN",
    "logstash.filters.mutate" : "WARN",
    "logstash.inputs.beats" : "WARN",
    "logstash.inputs.courier" : "WARN",
    "logstash.inputs.http" : "WARN",
    "logstash.inputs.lumberjack" : "WARN",
    "logstash.instrument.periodicpoller.cgroup.cpuacctresource" : "WARN",
    "logstash.instrument.periodicpoller.cgroup.cpuresource" : "WARN",
    "logstash.instrument.periodicpoller.deadletterqueue" : "WARN",
    "logstash.instrument.periodicpoller.jvm" : "WARN",
    "logstash.instrument.periodicpoller.os" : "WARN",
    "logstash.instrument.periodicpoller.persistentqueue" : "WARN",
    "logstash.modules.scaffold" : "WARN",
    "logstash.outputs.redis" : "WARN",
    "logstash.pipeline" : "WARN",
    "logstash.plugins.registry" : "WARN",
    "logstash.runner" : "WARN",
    "logstash.setting.writabledirectory" : "WARN",
    "logstash.util.decorators" : "WARN",
    "org.logstash.Event" : "WARN",
    "org.logstash.FieldReference" : "WARN",
    "org.logstash.Logstash" : "WARN",
    "org.logstash.beats.BeatsHandler" : "WARN",
    "org.logstash.beats.BeatsParser" : "WARN",
    "org.logstash.beats.ConnectionHandler" : "WARN",
    "org.logstash.beats.MessageListener" : "WARN",
    "org.logstash.beats.Server" : "WARN",
    "org.logstash.execution.AbstractPipelineExt" : "WARN",
    "org.logstash.execution.ShutdownWatcherExt" : "WARN",
    "org.logstash.execution.WorkerLoop" : "WARN",
    "org.logstash.instrument.metrics.gauge.LazyDelegatingGauge" : "WARN",
    "org.logstash.netty.SslSimpleBuilder" : "WARN",
    "org.logstash.plugins.inputs.http.MessageHandler" : "WARN",
    "org.logstash.plugins.inputs.http.util.SslSimpleBuilder" : "WARN",
    "org.logstash.plugins.pipeline.PipelineBus" : "WARN",
    "org.logstash.secret.store.SecretStoreFactory" : "WARN",
    "org.logstash.secret.store.backend.JavaKeyStore" : "WARN",
    "slowlog.logstash.codecs.json" : "WARN",
    "slowlog.logstash.codecs.jsonlines" : "WARN",
    "slowlog.logstash.codecs.plain" : "WARN",
    "slowlog.logstash.filters.grok" : "WARN",
    "slowlog.logstash.filters.metrics" : "WARN",
    "slowlog.logstash.filters.mutate" : "WARN",
    "slowlog.logstash.inputs.beats" : "WARN",
    "slowlog.logstash.inputs.courier" : "WARN",
    "slowlog.logstash.inputs.http" : "WARN",
    "slowlog.logstash.inputs.lumberjack" : "WARN",
    "slowlog.logstash.outputs.redis" : "WARN"
  }
}

The running container version, I can see it's the latest version of the container I built:

            "Labels": {
                "container-version": "6.7.1-b748-git92aefb01477",
                "license": "Apache-2.0",
                "org.label-schema.build-date": "20190305",
                "org.label-schema.license": "GPLv2",
                "org.label-schema.name": "logstash",
                "org.label-schema.schema-version": "1.0",
                "org.label-schema.url": "https://www.elastic.co/products/logstash",
                "org.label-schema.vcs-url": "https://github.com/elastic/logstash-docker",
                "org.label-schema.vendor": "Elastic",
                "org.label-schema.version": "6.7.1"
            }

In build b748 of the processor container, I can see it's using b96 of 6.7.1:

++ docker images our.reg/logstash:latest
++ awk '($0!~"TAG") {print $3}'
+ export IMAGEID=0a7164e2df00
+ IMAGEID=0a7164e2df00
++ docker images our.reg/logstash
++ grep 0a7164e2df00
++ awk '($0!~"latest") {print $2}'
+ export CURRENTVERSION=6.7.1-b96-gite6f0ee23
+ CURRENTVERSION=6.7.1-b96-gite6f0ee23
++ echo 6.7.1-b96-gite6f0ee23
++ sed -e 's/-.*$//g'
+ export LS_VERSION=6.7.1
+ LS_VERSION=6.7.1
++ git rev-parse HEAD
++ cut -c1-8
+ export GITHASH=92aefb01
+ GITHASH=92aefb01
+ echo LS_BUILD_NUMBER=6.7.1-b748-git92aefb01

Looking at build b96, I can see the steps where the plugin was removed and successfully installed:

Step 7/33 : RUN /usr/share/logstash/bin/logstash-plugin remove logstash-filter-dns
 ---> Running in 105350a4a8c4
Successfully removed logstash-filter-dns
Step 22/33 : RUN /usr/share/logstash/bin/logstash-plugin install --version 3.0.13 logstash-filter-dns
 ---> Running in f2572b79282e
Validating logstash-filter-dns-3.0.13
Installing logstash-filter-dns
Installation successful
Removing intermediate container f2572b79282e
 ---> d15b792d7793

@berglh
Copy link
Author

berglh commented Apr 30, 2019

image

I do find it weird though, that I'd have four containers all stall their pipelines simultaneously. Since the last watchdog container restarts, there's been no stoppages, but a much lower volume of events overnight of course. Maybe this problem is not related to the DNS filter at all. However, it's still strange that DNS filter is complaining about resolv failures. Remembering that:

  1. I have about 6 instances of the plugin spread across three of the processing pipelines.
  2. I have pipeline batch sizes set to 125.
  3. I have multiple workers for each pipeline, which probably means 20~30 instances of logstash-filter-dns with possibly many simultaneous requests to resolv.rb.

image

@berglh
Copy link
Author

berglh commented May 1, 2019

@colinsurprenant After a further stall on the final instance, I've reverted to a single DNS server again to see how it performs.

@colinsurprenant
Copy link
Contributor

I tried updating 6.7.1 with the dns filter 3.0.13 and everything looks good on my side:

  • fresh 6.7.1 install
curl -XGET 'localhost:9600/_node/plugins?pretty'
...
{
    "name" : "logstash-filter-dns",
    "version" : "3.0.12"
}
...
  • bin/logstash-plugin remove logstash-filter-dns
  • bin/logstash-plugin install --version 3.0.13 logstash-filter-dns
curl -XGET 'localhost:9600/_node/plugins?pretty'
...
{
    "name" : "logstash-filter-dns",
    "version" : "3.0.13"
}
...

@berglh
Copy link
Author

berglh commented May 7, 2019

@colinsurprenant Looks perfectly good on your end. I must have some troubles with the additional modifications I'm doing to the container. I will try it manually by working interactively with the unmodified official Elastic container and debug from there. Thanks for all your help.

@berglh
Copy link
Author

berglh commented May 10, 2019

@colinsurprenant I've confirmed, that in the Official logstash container, version 6.7.1 and 6.7.2, the plugin reports 3.0.12 after installing the 3.0.13. I did the same process in the Official logstash 7.0.1 container, and it does report the correct version as per your experience. All of these were launched with the most basic config:

bin/logstash -e 'input { stdin { } } output { stdout {} }'

I tried the same process in freshly downloaded tar.gz and observed the same as yourself. I'm not entirely sure as to what the difference between the tar.gz and Container release is.

I'm going to attempt building and deploying logstash 7.0.1 container now and report my findings.

@berglh
Copy link
Author

berglh commented May 13, 2019

After solving some configuration compatibility problems I now have one instance of 7.0.1 container running with the plugin reporting the correct version. I will monitor overnight, although I think this is pretty much case closed now. 👏 Thanks for all your help @colinsurprenant

@berglh
Copy link
Author

berglh commented May 14, 2019

@colinsurprenant - I'm still getting DNS resolving timeouts, but this could be for many reasons I guess. I suppose the most important thing is to ensure it doesn't crash completely, so I'll keep it running for longer.

[2019-05-13T07:48:07,728][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.138.88.244"}
[2019-05-13T07:48:07,727][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"192.168.244.57"}
[2019-05-13T10:21:13,283][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.80.51"}
[2019-05-13T12:57:51,995][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.138.108.58"}
[2019-05-13T14:34:15,984][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[@metadata][ip]", :value=>"10.102.132.20"}
[2019-05-13T21:53:53,942][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.113.79"}
[2019-05-13T23:00:04,476][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.57.5.224"}
[2019-05-13T23:09:49,262][WARN ][logstash.filters.dns     ] DNS: timeout on resolving the hostname. {:field=>"[@metadata][peoplesoft][clientip]", :value=>"pa49-197-108-208.pa.qld.optusnet.com.au"}
[2019-05-13T23:09:49,263][WARN ][logstash.filters.dns     ] DNS: timeout on resolving the hostname. {:field=>"[@metadata][peoplesoft][clientip]", :value=>"HASS-ED-6QRCBS2.our.domain"}
[2019-05-13T23:23:06,382][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.50.137.104"}
[2019-05-13T23:30:12,062][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.102.72.43"}
[2019-05-13T23:30:12,061][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.102.72.33"}
[2019-05-13T23:30:12,063][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.102.72.43"}
[2019-05-13T23:36:01,253][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.102.97.152"}
[2019-05-14T00:01:06,931][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.208.8.146"}
[2019-05-14T00:01:06,931][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.21.15"}
[2019-05-14T00:01:36,667][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.18.242"}
[2019-05-14T00:01:37,264][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.18.242"}
[2019-05-14T00:07:30,358][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.105.54"}
[2019-05-14T00:07:30,358][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.105.54"}
[2019-05-14T00:12:01,733][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"192.168.171.234"}
[2019-05-14T00:21:04,413][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.81.8.124"}
[2019-05-14T00:21:04,419][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.13.64.247"}
[2019-05-14T00:54:04,415][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"192.168.244.80"}
[2019-05-14T01:00:04,033][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.153.13.84"}
[2019-05-14T01:00:04,033][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.48.14.111"}
[2019-05-14T01:00:52,803][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.21.94"}
[2019-05-14T01:00:52,803][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.21.94"}
[2019-05-14T01:12:06,993][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.106.111"}
[2019-05-14T01:12:06,993][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.153.135.211"}
[2019-05-14T01:19:36,280][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.49.106.117"}
[2019-05-14T01:19:36,280][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.33.10.20"}
[2019-05-14T01:20:46,842][WARN ][logstash.filters.dns     ] DNS: timeout on resolving address. {:field=>"[source][host]", :value=>"10.33.22.106"}

@colinsurprenant
Copy link
Contributor

@berglh getting occasional resolving timeouts is normal IMO - the problem is when all resolving actions result in a systematic timeout.

@berglh
Copy link
Author

berglh commented May 30, 2019

@colinsurprenant I agree, just seem to be primarily around the start-up of the pipelines, but not big deal. I'm quite certain the blocked pipeline due to the plugin completely failing is resolved now that I was able to install the new version in logstash 7 container. I'm still having occasional locking in the pipelines, but I'm certain it's some kind of data in some log messages rather than anything else. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants