[Feature] Consider verifying checksum of Openstack Storage upload #236

rossj · 2014-01-16T23:00:36Z

Currently, it is up to a user to ensure data integrity by sending an ETag header during an upload or checking it the response. Is there any interest in moving this checking functionality into pkgcloud? Using a through stream we could calculate the checksum during upload and compare at the end.

Questions I still have are:

What is the overhead of calculating the checksum? I would guess very small but don't actually know.
What happens if the checksum match fails? Probably remove the file and return an error.. but what if the the remove operation fails?

kenperkins · 2014-01-14T18:54:16Z

Looks like we can't hash a file without storing it in memory or on disk. Which means we'll have to write some code to create a temp file store, and that has problems as well. Not sure I'm in love with that. In memory is a non-starter as most users are running node on limited memory VMs.

The hash itself should be pretty straightforward if we have the file on disk or in memory.

rossj · 2014-01-14T19:18:36Z

I believe we could calculate the hash without temporarily storing the file using something like jeffbski/digest-stream, which calculates the hash of a stream. My understanding of MD5 is that it only needs to buffer 512 bytes of data before it can update its state, discard the data, and wait for the next chunk.

kenperkins · 2014-01-14T19:25:45Z

Are we sure that's not buffering?

https://github.com/jeffbski/digest-stream/blob/master/lib/digest-stream.js#L28

jcrugzz · 2014-01-14T19:29:46Z

@kenperkins it flushes the buffer on each digest when the end function is called

https://github.com/jeffbski/pass-stream/blob/master/lib/pass-stream.js#L46-L49

kenperkins · 2014-01-14T19:31:39Z

If so, lets write up a prototype/branch :) @rossj are you on it?

rossj · 2014-01-14T19:58:27Z

@kenperkins I'm pretty sure that .push() is just passing the file data on to whatever the next stream is. http://nodejs.org/api/stream.html#stream_readable_push_chunk_encoding

_flush() should only be called at the end, when the input stream dies, at which point digest-stream will give us the final hash.

I'll set up a branch and try some things out.

…t returned ETag header. For feature pkgcloud#236.

kenperkins · 2014-01-17T15:54:06Z

lib/pkgcloud/openstack/storage/client/files.js


  var container = options.container,
-      success = callback ? onUpload : null,


Perhaps @indexzero should have put a comment here, but this was very deliberate. If you pass a callback into mikeal/request, it will buffer the entire contents of the stream into memory.

That's why we had the code setup to check that if there was no callback provided by the caller, don't pass one down the chain.

You are correct here @kenperkins. @rossj we want to preserve that check.

@kenperkins @jcrugzz I'm pretty confident that mikeal/request does not buffer outgoing data, even if a callback is provided. While #195 mentions both upload and download, the 2 request issues that it references (request/request#639 and request/request#477) only refer to download buffering being a problem.

Case in point, I uploaded a 2.8 GB without issue and the process's memory usage hovered between 90-160 MB the whole time (comparable to current 0.9.0 branch).

Currently the code is always passing a callback down the chain to request so that it can do the hash check on response. If this is still a concern, I can probably do this in request.on('response').

@rossj is correct. Buffering only occurs for downloads. See: https://github.com/mikeal/request/blob/master/request.js#L843-L890

kenperkins · 2014-01-21T15:52:50Z

Thanks for the followup @rossj. So looks like we can callback every time in upload.

kenperkins · 2014-02-20T04:58:22Z

I finally had a chance to get this branch evaluated on my local machine. Props for getting tests running. I have a few comments that I'll inline.

kenperkins · 2014-02-20T05:00:14Z

lib/pkgcloud/openstack/storage/client/files.js

+
+    // The returned checksum does not match what was calculated on client side, remove file
+    self.removeFile(container, options.remote, function (err) {
+      if (err)


This feels brittle to me. Obviously our service should be highly available, but the fact that we upload the file, then check and delete feels inverted.

I guess there's not much we can do if you want to not have to buffer the entire file locally to generate the hash as part of the inbound request.

kenperkins · 2014-03-18T00:42:51Z

@rossj I think what we'd like to do is close this PR, hoping instead to leverage this code as an example on top of pkgcloud. There are a couple of reasons for it.

There is an inherent brittleness here that makes this potentially troublesome for an end user. For example, if there is a partition after upload, but before we validate the MD5 upload.
Bakes in complexity that is better off being more transparent to the end user. I think your idea was great; take advantage of the MD5 capabilities to verify an upload. The problem is that we can't do this as part of the upload in a streaming scenario, and because of that, it results in some odd post-upload behavior.

Would you be willing to re-work this code into an example that we could include in the openstack storage examples?

rossj · 2014-03-25T03:35:31Z

Hey @kenperkins, I think this makes sense. There are some definite edge cases where there isn't a clear path forward, and these cases are probably best left to the user's implementation.

What exactly do you mean by "partition after upload" in your 1st point? Would CloudFiles potentially move or split the file after upload?

I'll take a look at the examples and try to work this in.

rossj added a commit to rossj/pkgcloud that referenced this pull request Jan 16, 2014

[openstack storage] Modify upload() to compute MD5 and compare agains…

d750847

…t returned ETag header. For feature pkgcloud#236.

[openstack storage] Modify upload() to compute MD5 and compare agains…

7d9c681

…t returned ETag header. For feature pkgcloud#236.

kenperkins reviewed Jan 17, 2014
View reviewed changes

kenperkins reviewed Feb 20, 2014
View reviewed changes

kenperkins closed this Mar 18, 2014

snyk-bot mentioned this pull request Mar 26, 2020

[Snyk] Upgrade fast-json-patch from 2.1.0 to 2.2.1 LabShare-Archive/pkgcloud#10

Closed

This was referenced Jul 27, 2020

[Snyk] Security upgrade fast-json-patch from 2.1.0 to 2.2.1 LabShare-Archive/pkgcloud#14

Closed

[Snyk] Security upgrade fast-json-patch from 2.1.0 to 2.2.1 acid-chicken/pkgcloud#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Consider verifying checksum of Openstack Storage upload #236

[Feature] Consider verifying checksum of Openstack Storage upload #236

rossj commented Jan 16, 2014

kenperkins commented Jan 14, 2014

rossj commented Jan 14, 2014

kenperkins commented Jan 14, 2014

jcrugzz commented Jan 14, 2014

kenperkins commented Jan 14, 2014

rossj commented Jan 14, 2014

kenperkins Jan 17, 2014

jcrugzz Jan 17, 2014

rossj Jan 17, 2014

indexzero Jan 18, 2014

kenperkins commented Jan 21, 2014

kenperkins commented Feb 20, 2014

kenperkins Feb 20, 2014

kenperkins commented Mar 18, 2014

rossj commented Mar 25, 2014


		var container = options.container,
		success = callback ? onUpload : null,

[Feature] Consider verifying checksum of Openstack Storage upload #236

[Feature] Consider verifying checksum of Openstack Storage upload #236

Conversation

rossj commented Jan 16, 2014

kenperkins commented Jan 14, 2014

rossj commented Jan 14, 2014

kenperkins commented Jan 14, 2014

jcrugzz commented Jan 14, 2014

kenperkins commented Jan 14, 2014

rossj commented Jan 14, 2014

kenperkins Jan 17, 2014

Choose a reason for hiding this comment

jcrugzz Jan 17, 2014

Choose a reason for hiding this comment

rossj Jan 17, 2014

Choose a reason for hiding this comment

indexzero Jan 18, 2014

Choose a reason for hiding this comment

kenperkins commented Jan 21, 2014

kenperkins commented Feb 20, 2014

kenperkins Feb 20, 2014

Choose a reason for hiding this comment

kenperkins commented Mar 18, 2014

rossj commented Mar 25, 2014