Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix possible over size exception when batch udp #249

Merged
merged 1 commit into from
Nov 27, 2016

Conversation

jiafu1115
Copy link
Contributor

@jiafu1115 jiafu1115 commented Nov 22, 2016

Motivation:

UDP packge size should <=64K, So if we enable batch for write points by UDP. it may involved the IO exception when we batch too many points:

java.io.IOException: Message too long

Modifications:

write directly for UDP in batch processor without batch.

Result:

  1. avoid over size exception. If the exception occur. use should limit point's content
  2. downgrade performance due to no batch for UDP now. but it is not key point due to it is in async mode

@codecov-io
Copy link

codecov-io commented Nov 22, 2016

Current coverage is 73.25% (diff: 100%)

Merging #249 into master will increase coverage by 0.39%

@@             master       #249   diff @@
==========================================
  Files            11         11          
  Lines           641        643     +2   
  Methods           0          0          
  Messages          0          0          
  Branches         75         76     +1   
==========================================
+ Hits            467        471     +4   
+ Misses          142        140     -2   
  Partials         32         32          

Powered by Codecov. Last update 892ba94...364870b

@jiafu1115
Copy link
Contributor Author

This is the fastest solution to fix the issue and we can consider to improve UDP batch by partition later if needed?

@majst01
Copy link
Collaborator

majst01 commented Nov 22, 2016

As i already stated in #239 (comment) im not sure why we need udp batch processing at all.

I wanted to show me how this change will improve write performance, this was not done.

For this particular issue we need to create:

  1. a bunch of performance comparisons between udp single write vs. batched write performance, written as unit tests.
  2. A unit tests which shows that oversized udp packets lead to MessageSizeExceptions

If this is done, we will consider how to proceed, either:

  • improve batch processing to cover different Message Size situation
  • skip batch processing for udp, and throw a IllegalArgument if someone will try to do so.

@jiafu1115
Copy link
Contributor Author

jiafu1115 commented Nov 22, 2016

@majst01 write two points into one data package should have better performance then write two points with two package. The current issue is that we can't put too many points into one package due to UDP limit.
Whatever, I will add unit test for oversized or performance compare.
in short, batch is better, thus, we cann't batch too many due to UDP limit, That is the key point.

#239 's code is to support both asnyc and batch, this PR is to disable batch to prevent over size udp limit but still support async.

@majst01
Copy link
Collaborator

majst01 commented Nov 22, 2016

@jiafu1115 "should" is short for i guess, or i dont know. Only seeing is believing/knowing

I will not accept any pr´s on this topic without the 2 questions from above answered.

@jiafu1115
Copy link
Contributor Author

@majst01 I will prove it due to I should take responsibility for my code! Thanks

@jiafu1115
Copy link
Contributor Author

jiafu1115 commented Nov 22, 2016

@majst01 check new add test for performance compare and exception cases:

for performance: (you can also run it with different batch numbers instead of 300/500/800/1000)

300
performance(ms):write udp with batch of 300 string:25
performance(ms):write udp with 300 single strings:34

500
performance(ms):write udp with batch of 500 string:22
performance(ms):write udp with 500 single strings:85

800
performance(ms):write udp with batch of 800 string:28
performance(ms):write udp with 800 single strings:111

1000
performance(ms):write udp with batch of 1000 string:29
performance(ms):write udp with 1000 single strings:173

for exception over udp limit:
I also add one test case to prove if the udp batch size is over 64K, it will popup one exception.

for async batch mode's udp, we should handle it to avoid this exception:
for sync mode's udp, we had no needs to handle it. The user should know that there is one limit for udp package max size.
what's more, I had added into readme for udp usage.

Let's figure out what's current status:
the fist table is now, the second table is after this PR:

microsoft excel - new microsoft excel worksheet xlsx

no matter using sync/async or using batch/non-batch, it may happen oversize exception. but it is user's fault to do it or our fault to batch too many points. So we can just fix our fault and add readme to reminder user to pay more attention for user's input size.

we can also split the async batch by 64K, but it may isn't the best size for it if considered MTU's size or send buffers' limit or "more data more loss package". So I suggest to disable the async batch for UDP (this PR) just support async one by one and sync batch and sync one by one support.

WDTY? @majst01 Thanks

Motivation:

UDP package size should <= 64K, So if we enable asynce batch for write
points
by UDP. it may involved the IO exception when we batch write too many
points:

"java.io.IOException: Message too long
"

Modifications:

write directly without batch for udp in batch processor.

Result:

1 avoid over size exception for async batch. If the exception occur for
sync mode. use should limit point's content.
@majst01
Copy link
Collaborator

majst01 commented Nov 27, 2016

Hi @jiafu1115 now i had time to read all aspects and my conclusion is like yours, disable batching for UDP ist the best option. To solve this in a correct way we would need to figure out what exact environment is in place like MTU and SO_SNDBUF and then resend packets on errors etc. This is not the way to go in our library.

I merge this right now. This should be the last PR for 2.5.
What is missing from my perspective is a valid CHANGELOG.md. Are you willing to collect some points, i will do as well.

@majst01 majst01 merged commit 2e04e6d into influxdata:master Nov 27, 2016
@jiafu1115
Copy link
Contributor Author

ok.Thanks. I will collect all points for changing log from my view and co-work with you to prepare changes in readme.

@lxhoan lxhoan mentioned this pull request May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants