-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS data below 64KB not flushed to HDFS #107
Comments
Hi @muralirvce, Are you making sure to call
Or do you mean when keeping a file open for more than an hour? |
I call close, but before that the timeout has expired. I write data every 30 seconds once like 5KB, we do not have 64KB to flush. Hence we will not send the data out until we get 64KB. By the time we close the connection could have timedout. IMO we can add a flush routing to the FileWriter class. I have done some changes, can submit the patch if needed. |
There might be a keepalive we can send, rather than flushing. I'd have to research. |
Keepalive sounds good. IMO Adding a flush also might not be a bad idea, incase if the data has to be visible. Only part is the block length will not be updated but the data is present. Block reporting will happen when we close. Please let me know your take on it. It is like any other file system writer. |
Exposing 'Flush' to the user makes a lot of sense to me. Having it done automatically less so. You mention the completeBlock issue, which is why HDFS is less than friendly for your use case (I didn't realize it let you keep the lease open for more than an hour even if you do flush). You might be better off doing individual append operations.
On Wed, Apr 04, 2018 at 8:45pm, muralirvce < notifications@github.com [notifications@github.com] > wrote:
Keepalive sounds good. IMO Adding a flush also might not be a bad idea, incase if the data has to be visible. Only part is the block length will not be updated but the data is present. Block reporting will happen when we close. Please let me know your take on it. It is like any other file system writer.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub [#107 (comment)] , or mute the thread [https://github.com/notifications/unsubscribe-auth/AAyKDxqKPCSIwEt0aPttjBLJ49_Dxw_4ks5tlRTegaJpZM4THKMS] .
|
I exposed |
Hi,
I see that data is sent in 64KB packets to the hdfs datanode in rpc requests. But if there is not data pending say 63KB then if it takes more than a minute for the data to arrive the rpc connection will be closed and the data will not be written. Would it be good to add a flush routine on the writer to help in such scenarios?
Thanks,
Murali
The text was updated successfully, but these errors were encountered: