-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipe broken exception on write to GCS #103
Comments
We have recently made improvements to handling network errors: 1115700 Yesterday we released it in 1.6 branch: https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.6.7 May you try to use GCS connector 1.6.7 and see if it solves the issue? |
Hi @medb! Unfortunately, I see the same exceptions, but I found a related exception in another log:
|
Looks like googleapis/google-cloud-java#1018 is the related issue. But I'm not sure yet, how to deal with it. |
This could be related to the Accumulo's write pattern. If it writes a lot of small files at high QPS rate to GCS then it could cause GCS to drop connections which will manifest itself in "java.io.IOException: Pipe broken" exception in GCS connector. Do you know what kind of objects/files, of what size and how often Accumulo writes to GCS? |
Not sure about the write pattern yet, I'll need to figure it out. BTW: I tried to set |
GCS Connector does not support Apache Accumulo because it geared toward high-throughput use case (reading/writing big objects continuously) which is not the use case of Apache Accumulo. Also, Apache Accumulo has relevant thread where discussed GCS support and it ended with conclusion that GCS is not supported as backing storage for Apache Accumulo: apache/accumulo#428 |
Although current issue and apache/accumulo#428 both are related to the GCS, they are different, because they occur in different situations.
I don't think that Accumulo should support particular file systems (GCS, Azure, etc.). I believe, that it should be the other way around. The custom implementations need to make sure that they properly support the HDFS interface. |
The problem is that GCS, Azure Blob Store and AWS S3 are not file systems, but object stores and Apache Accumulo written in mind with HDFS capabilities, which could not be fully supported by object stores. GCS connector tries to mimic HDFS semantic, but because of object stores limitations it could not do so fully. We need to take a look into Accumulo use case to determine if it possible to make it work with GCS, but because Accumulo is not supported now by GCS connector, it's not immediate action item for us. |
@medb The Apache Accumulo issue you referenced did not conclude that GCS wouldn't, or couldn't, be supported. That issue was closed because the question raised by the user about what was the explanation for the issue they were seeing, that question was answered. The supported solution is to use a LogCloser configured on the user's class path for Accumulo which will handle closing logs on GCS. I don't know enough about GCS to know for sure, but it may be sufficient to trivially fork Accumulo's built-in HadoopLogCloser, and do nothing instead of throwing the IllegalStateException when the FileSystem is GCS (essentially, no attempt to do lease recovery, just like in the local file system case). I do not think that the issue has anything to do with Accumulo's write pattern... as suggested here... at least, not if it's the same issue as the one you referenced. It's likely a simple matter of implementing an appropriate LogCloser. |
Yes, it is currently not supported, that's why I created FR (#104) to add support for Accumulo. For posterity, currently it looks like there are 2 issues:
@ctubbsii thank you for clarifying LogCloser issue and suggested solutions |
I'm importing some data to Apache Accumulo which runs on top of Google Cloud Storage (as HDFS replacement). I use the GCS connector 1.8.1-hadoop2 and Accumulo runs in GCloud VMs.
I see the following exceptions in the logs quite frequently (the first - on
GoogleHadoopOutputStream.write
, the second - onGoogleHadoopOutputStream.close
):Accumulo marks this exception with the ERROR level.
What could be the root cause? How to get more details about the exception (debug logs, etc.)? Thank you!
The text was updated successfully, but these errors were encountered: