Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

Closed
awlamb opened this issue Mar 25, 2016 · 12 comments
Closed
Labels
documentation This is a problem with documentation.

Comments

@awlamb
Copy link

awlamb commented Mar 25, 2016

>>> a = client.get_object(Bucket='imgtest',Key='testimage1.jpg')
>>> a['Body'].read()
b'...\xadk\xc9,\xda\xe7\xcb\xb7$\x91\xf7\xb3\xd3>\xd5V...'
>>> a['Body'].read()
b''

complete bytes removed for brevity. I get an object, and read it. Then I read it again, but no bytes are returned.

If this stream acts as a normal file IO stream, how can I seek to the beginning of the stream? seek() does not seem to be a method on the streamingBody object.

@awlamb awlamb closed this as completed Mar 25, 2016
@awlamb awlamb reopened this Mar 25, 2016
@kyleknap
Copy link
Contributor

The class is described here. We will look to see if we can get this ported over or linked in the boto3 docs.

As seen in the docs, if you call read() with no amount specified, you read all of the data. So if you call read() again, you will get no more bytes.

There is also no seek() available on the stream because we are streaming directly from the server. The only way we could add a seek() method is to store all of the data in memory, which is not a great idea as body could be GB's large.

@kyleknap kyleknap added the documentation This is a problem with documentation. label Mar 25, 2016
@haizaar
Copy link

haizaar commented Oct 27, 2016

Is there any particular reason that this is still an open ticket?

@awlamb awlamb closed this as completed Oct 27, 2016
@danielmorozoff
Copy link

danielmorozoff commented Aug 20, 2017

Is there a reason why the StreamingBody, is not seekable?
This becomes quite problematic when attempting to download portions of large files asynchronously. And what is the recommended way to do this?

@robehickman
Copy link

@danielmorozoff 'get_object' supports a range parameter.

client.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(amount_read, amount_read + chunk_size))

@alanjds
Copy link

alanjds commented Apr 5, 2018

One way to allow .seek() is by botocore' StreamingResponse to receive the _raw_stream opener (factory?), not the realized object. Then seeking to 0 would be just restarting the _raw_stream.

See: https://github.com/boto/botocore/blob/master/botocore/response.py#L42

@ryanermita
Copy link

is there any work around to use seek in StreamBody?

@ryanermita
Copy link

I solved this by using _raw_stream as per @alanjds comment above.
is this a good solution? or is there a better one?

raw_stream = codecs.getreader('utf-8-sig')(temp_file[u'Body'])._raw_stream.read().decode("UTF8") 
stream_csv = io.StringIO(raw_stream, newline=None)
stream_csv.seek(0)

@alanjds
Copy link

alanjds commented Aug 17, 2018

@ryanermita I was thinking in a way to seek and not putting the whole file in memory.

If you have no problem in filling the memory with the file, a cleaner way is to just StringIO(streaming_body.read()), then seek the StringIO as you are already doing.

@ryanermita
Copy link

I will try this one, thank you @alanjds 👍

@codeman101
Copy link

codeman101 commented Jul 13, 2019

@kyleknap

Has it been suggested to change the botocore.streambody? I ran into this issue twice. (the second time was because I haven't used read() on the object in a while. Even the documentation you linked to doesn't make it clear to me that the stream gets flushed after the first read. It'd be more intuitive if the stream was copied when read instead of flushed.

@cjohnson318
Copy link

I found a solution that worked for me. It involves writing a wrapper that supports seek(). I also read about smart_open in another blog, but I haven't tried it.

@ghost
Copy link

ghost commented Mar 9, 2022

If someone is having the problem 'bytes' object has no attribute 'seek' I solved with the following:

  obj = s3.get_object(Bucket='Mybucket', Key='MyObjKey')
  body = obj['Body']
  file_like_obj = io.BytesIO(body.read())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This is a problem with documentation.
Projects
None yet
Development

No branches or pull requests

9 participants