Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Read Seekability #15032

Merged

Conversation

seanmcc-msft
Copy link
Member

Related #9607

@ghost ghost added the Storage Storage Service (Queues, Blobs, Files) label Sep 9, 2020
@seanmcc-msft seanmcc-msft marked this pull request as ready for review September 10, 2020 00:26
Comment on lines 339 to 346
if (newPosition > _length)
{
if (_allowBlobModifications)
{
_position = newPosition;
_seeked = true;
return newPosition;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if length has been updated and validate it? What's the scenario where user would like to seek beyond length ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below, I'm not sure we can do network calls within Seek().


// Act
Stream outputStream = await blob.OpenReadAsync(options: options).ConfigureAwait(false);
outputStream.Seek(size + 10, SeekOrigin.Begin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if user tries to read after this operation and blob is not modified ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exception would have been thrown within Seek()

case SeekOrigin.End:
if (_allowBlobModifications)
{
throw new ArgumentException($"Cannot {nameof(Seek)} with {nameof(SeekOrigin)}.{nameof(SeekOrigin.End)} on a growing blob or file.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we could use current length to calculate this (extra get properties call).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can, Seek() is a sync operation, and there doesn't appear to be a SeekAsync() - https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.seek?view=netcore-3.1.

Based on this assumption, I only updated the positions in the Seek() call, and do the necessary network calls in ReadInternal().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. maybe we should disallow all seeks if blob is modifiable.

@seanmcc-msft
Copy link
Member Author

@tg-msft, what do you think?

@seanmcc-msft
Copy link
Member Author

Do we need to add seekability tests for SMB Files and Data Lake? This change is in Common, so I was thinking just the Blob tests should be able to cover it.

Copy link
Member

@tg-msft tg-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add similar tests for every type of resource that works with OpenReadStream in case there are any service semantics that would cause problems with the behavioral assumptions we're making when we wrap them by streams. And that's more for the sake of future authors who will need to deal with supporting new service features when they're added.

/// <summary>
/// Indicated the user has called Seek() since the last Read() call, and the new position is outside _buffer.
/// </summary>
private bool _seeked;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - _seeked is kind of a funky name that describes more about the history of actions rather than the current state of the object. What about a name like _bufferInvalidated to make code like if (_bufferPosition == 0 || _bufferPosition == _bufferLength || _bufferInvalidated) read a little easier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Comment on lines 139 to 152
if (_bufferPosition == 0 || _bufferPosition == _bufferLength)
if (_bufferPosition == 0 || _bufferPosition == _bufferLength || _seeked)
{
int lastDownloadedBytes = await DownloadInternal(async, cancellationToken).ConfigureAwait(false);
if (lastDownloadedBytes == 0)
{
return 0;
}
_seeked = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code 10 lines above this, do you need to change all the condition checks from _position == _length to _position >= _length? What happens in the case where I allow modifications, seek past the end of the blob, and then try to read again? If we update the length above we'll return 0 bytes read. If you don't update the length, you'll drop into DownloadInternal and that'll ask for a range beyond the end of the blob which will cause an exception.

We should also add more test cases for allowBlobModifications:

  • Seek Beyond End and Find Nothing
    stream.Seek(stream.Length + 100);
    byte[] data = new byte[1];
    Assert.Zero(stream.Read(data, 0, 1));
  • Seek Beyond End and Find Data
    stream.Seek(stream.Length + 100);
    underlyingBlob.Upload(longerDocument);
    byte[] data = new byte[1];
    Assert.NotZero(stream.Read(data, 0, 1));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you fix this? The code on line 120/126 is still just checking if (_position == _length). I'm talking about this feedback in particular:

In the code 10 lines above this, do you need to change all the condition checks from _position == _length to _position >= _length? What happens in the case where I allow modifications, seek past the end of the blob, and then try to read again? If we update the length above we'll return 0 bytes read. If you don't update the length, you'll drop into DownloadInternal and that'll ask for a range beyond the end of the blob which will cause an exception.

Unless I was mistaken about the need for a fix here because the extra unit tests showed no problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests pass as expected. I get 0 bytes read if there is no data, and the data I was expecting when it exists.

Copy link
Member

@tg-msft tg-msft Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I was worried about a 416 per https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-the-range-header-for-blob-service-operations#format-2-bytesstartbyte-endbyte if we sent an invalid range. Is there any way that could get triggered here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in OpenReadAsync_Seek_NewPositionGreaterThanBlobLength_AllowBlobModificationsTrue_NoData test. After fixing it's going to detect it (I left a comment next to it).

case SeekOrigin.End:
if (_allowBlobModifications)
{
throw new ArgumentException($"Cannot {nameof(Seek)} with {nameof(SeekOrigin)}.{nameof(SeekOrigin.End)} on a growing blob or file.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the error message to suggest they call Stream.Seek(Stream.Length, SeekOrigin.Begin) if they're trying to get the end of the known data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

else
{
throw new ArgumentException(
$"{nameof(newPosition)} exceeds known blob or file length. This condition is not allowed with allowBlobModifications == false.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowBlobModifications doesn't make much sense when you're working with a generic System.IO.Stream if this error gets raised randomly in the guts of some utility code. Maybe we could say something like You cannot seek past the end of a stream created by OpenReadStream that does not allow modifications to the underlying blob or file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@seanmcc-msft
Copy link
Member Author

@tg-msft, I've created tests for Data Lake and File, and addressed your other comments. Anything else?

BlobOpenReadOptions options = new BlobOpenReadOptions(allowModifications: true);

// Act
Stream outputStream = await blob.OpenReadAsync(options: options).ConfigureAwait(false);
Copy link
Member

@tg-msft tg-msft Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't read anything, does this set the length before you seek?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do a Get Properties request to make sure the blob exists and get it's initial length.

byte[] outputArray = new byte[1];

// Assert
Assert.Zero(stream.Read(outputArray, 0, 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to read from stream returned from OpenRead instead of original memory stream. When I try that it throws 416.

Suggested change
Assert.Zero(stream.Read(outputArray, 0, 1));
Assert.Zero(outputStream .Read(outputArray, 0, 1));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. How do you think we should handle this case? We could catch the exception and return 0, maybe there is a better option.

Copy link
Contributor

@kasobol-msft kasobol-msft Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @tg-msft suggested to return 0.
One thing to check: what happens if we seek almost till the real end and provide buffer that would result in range spaning across the end, i.e.:

  • create blob with length 1024
  • open read and seek till 1100
  • append 100 bytes to blob
  • read providing buffer that has 100 bytes+
    I wonder if it would throw 412 too in that case, but it should rather read from 1100 till 1024 and return 24.

@PaulVrugt
Copy link

It would be great if this was merged and released soon. We've been waiting for for this feature patiently for a long time now

@seanmcc-msft
Copy link
Member Author

@PaulVrugt, I need to do one more iteration to address seeking to a position > last known blob/file length, and we should be ready to merge.

@PaulVrugt
Copy link

Thanks Sean, I really appreciate the communication on this. Not so often you see such openness. Others could take an example from you

@seanmcc-msft
Copy link
Member Author

TODO - update change logs.

@seanmcc-msft seanmcc-msft merged commit bdad406 into Azure:master Oct 7, 2020
@seanmcc-msft seanmcc-msft deleted the feature/storage/openReadSeekability branch October 7, 2020 23:42
suhas92 pushed a commit to suhas92/azure-sdk-for-net that referenced this pull request Oct 12, 2020
annelo-msft pushed a commit to annelo-msft/azure-sdk-for-net that referenced this pull request Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants