-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong statistics with version 2.3 and threads #116
Comments
Thanks for the report. I have been trying to solve various thread related issues the past weeks but have been unsuccessfully so I will shortly release 2.4.0 which will have threads disabled by default. Am also setting up continuous testing environment so we can monitor the develop branch and make sure nothing like this happens again. |
And the peak of 250q/s in the first sample intervall looks strange too |
This is probably due to the total reworking of the libpcap parts that was made trying to solve other issues. Can you look in the XML/DAT files for the packet captured, parsed, dropped for the durations? For reference what distribution and kernel version are you running? |
Here are the statistics for 4 consecutive minutes. There are 200 q/s hitting the servers via 2 links, thus capturing on 2 interfaces. In total there are 400 packets/s or 24000packets/min (sample intervall). The pkts_captured on eth2 seems to alternate for the value of packets in 1 second.
This are the statistics for the first sample intervall which falsely reports 250q/s
|
Can you paste 4 captures from old also? |
Forgot to mention - we are using Ubuntu 14.04 with kernel: |
Here are 4 consecutive statistics using the old binary:
|
If you have not already please read my announcement regarding this: I have setup some test now for dsc, if you have any other ideas of tests please let me know: |
The testing platform is starting to yield some results, #122 may explain what you are seeing so could you check the start_time/stop_time in the files? |
The spike at the beginning is because the new code is using I will extend pcap-thread with functionality to active the interfaces after the sleep period. |
Just a question about the sleeping period: if the interval is 1hour - where does it sync to? To the next minute or to the next our? The latter would be bad because in case of a dsc restart we could loose up to 59minutes. |
If your interval is 1 hour and you start a second after the hour it would wait 59:59, you should see this in your syslog. |
If i find some time I will write an option to start immediately with a reduced first interval. Regarding start_time/stop_time -> tomorrow. |
Make it a dsc.conf option |
Regardin start/stop time: 1484249940.dscdata.xml: start_time="1484249880" stop_time="1484249940"> 1484254620.dscdata.xml: start_time="1484254560" stop_time="1484254620"> 1484259420.dscdata.xml: start_time="1484259360" stop_time="1484259420"> The 2.3 version we are running with -T does show 1s leap between start und previous stop time ~ every 6th frame. The old 2.0 version does not show any gaps. Shall we also test the current master? |
You don't have to run master, thanks for the information. Are any of these 2.3 based on the develop/a special commit? |
Btw, I am putting in a fix for the burst at startup and will add your request for an option to withgo of the delayed startup. |
Our 2.3 version is actually commit c02a170 |
Issue #116: Feature for not waiting on the interval sync
Spike at start should be solved now and your feature is added, if you can please test the latest develop branch. |
testing with or without -T? |
Threads are default disabled now so -T won't do anything unless you do:
configure --enable-threads
|
Were you able to run the develop branch? Do you have any issue to report? |
Using commit 1569bbf The respective XML file:
|
Can you paste start_time/stop_time from that file?
|
I pasted it already - see the first line. It looks correct. |
Maybe the 1 second delay is related to the reason why tcpdump has 1 second delay: http://unix.stackexchange.com/questions/224893/tcpdump-waits-a-second-before-displaying-packets |
How was dsc killed?
|
On Jan 20, 2017 20:40, "Klaus Darilion" <notifications@github.com> wrote:
I pasted it already - see the first line. It looks correct.
Sorry it didn't show in the email.
|
For the packets missed at the beginning its not related to tcpdump 1 sec.
Problem here is that things take time and you can't really measure it and
compensate it without adding additional processing time.
In your example for the first period would be because now the interval sync
is made then the activation of the pcaps and after that pcap-thread needs
time setup the loop. This takes time and with a high qps you will initially
miss some packets.
Btw, did you use the new pcap_buffer_size config option?
It will help in a lot of situations since the OS can buffer packets between
XML writes and maybe even between activate and pcap-thread loop.
|
a) It was killed with "kill " |
a&c) if you can strace -fF -s 1024 during this with timings options (don't
know exactly which) it would help a lot, send the output directly to me.
|
Okay, this is a problem but it is not related to libpcap, it is
As you can see I will add an option to set the pcap-thread timeout in |
Issue #116: Add config option pcap_thread_timeout
Please reset develop and use |
Regarding "dump_reports_on_exit": I open a new issue Regarding the missing packets in the first intervall. Using your suggestions everthing looks fine now. Although I have to admit that I do not understand the consequences of setting the pcap_thread_timeout to 10ms. Especially the docs say "As a workaround, set pcap_thread_timeout to a relevant millisecond timeout with regards to the packets per second received." Would this cause additional CPU load? What this help or make things worse in case of high packet rates? Should the value be higher or lower for high packet rates? What is the default value? |
I will try and clear up the text in
Not really, lowest you can set it is to 1ms and that is a lot of time in today's servers.
Helps in all situations it seems since
Lower since you want it to react faster.
1000 ms or 1 s, this is from the old dsc code that pcap-thread is built on. This has only to do with the timeout within pcap-thread that is used for BUT there is something going on within |
How about f1b3ec3, is it more clear? |
Wouldn't it be the other way? With high packet rate it does not timeout anyway, so the value does not matter. With low packet rates, dsc "blocks" other processing (e.g dumping a report) until a packet is received or select() times out - so a low value is actually needed for low query rates.
Yes. To me it seems that setting a default value of 100ms would ge good to have report dumping not delayed by 1s in case of low traffic. |
Actually no,
Yeah, looking at the old code (https://github.com/DNS-OARC/dsc/blob/v2.1.1/src/pcap.c#L883) it was 250ms so I agree the default should be changed. |
Hi!
We tested todays version from master (commit c02a170).
The problem is, that it does not report all requests, and even less responses. We started with 100q/s, increased to 200q/s, and then switched back to our old DSC version (reported as 2.0.0 rc1).
Requests
Responses
New DSC uses the same config as the old DSC.
When using newest DSC with option -T the count is correct (200q/s)
The text was updated successfully, but these errors were encountered: