Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover requests would not work for partially used streams #106

Open
rahulreddy opened this issue Jan 19, 2017 · 8 comments
Open

Failover requests would not work for partially used streams #106

rahulreddy opened this issue Jan 19, 2017 · 8 comments
Labels

Comments

@rahulreddy
Copy link
Collaborator

Identified by @alexandre-merle, streams which are consumed partially and failed over to another node would not work as it would only be able to pipe partial data

@rahulreddy rahulreddy added the bug label Jan 19, 2017
@ghost
Copy link

ghost commented Jan 19, 2017

This would explain scality/MetaData#940 then.

@rahulreddy
Copy link
Collaborator Author

I guess. I just can't think of why the source stream would end up with an error, I would have assumed that there were TCP errors connecting to any of the nodes from start.

@ghost
Copy link

ghost commented Jan 19, 2017

On the MD issue, @msegura analyzed for some connection closing related to the way the keepalive is configured on our Sproxyd's Tengine. Does it look to you like this could be the explanation ?

@rahulreddy
Copy link
Collaborator Author

I think it fits the picture. The log line with PUT chunk to sproxyd

{"name":"SproxydClient","error":{"code":"ECONNRESET"},"time":1484643601662,"req_id":"a7ea4c3151407e76b10e:7a6691a0746cb6785797","level":"error","message":"PUT chunk to sproxyd","hostname":"asvppdxobjs301.gecis.io","pid":59}

could only happen if a connection was established to the remote host, socket was assigned and then later destroyed by the remote host for some reason. In that case, the stream may have been partially consumed.
socket object has properties bytesRead / bytesWritten which we can use to do a failover, only when no bytes have been consumed.

@ThibaultRiviere
Copy link
Contributor

That can explain the retry error but not the first one right ?

@rahulreddy
Copy link
Collaborator Author

Yeah I can't explain the first one.

@rachedbenmustapha
Copy link
Contributor

Does this really happen though, given that production deployments only have 1 sproxyd endpoint configured and thus can not fail over?

@rahulreddy
Copy link
Collaborator Author

From the situation today, I realized that we have only 1 sproxyd endpoint. My speculation goes out of the window. It's a potential problem if there are multiple nodes in the bootstrap list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants