Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

awlamb · 2016-03-25T05:16:51Z

>>> a = client.get_object(Bucket='imgtest',Key='testimage1.jpg')
>>> a['Body'].read()
b'...\xadk\xc9,\xda\xe7\xcb\xb7$\x91\xf7\xb3\xd3>\xd5V...'
>>> a['Body'].read()
b''

complete bytes removed for brevity. I get an object, and read it. Then I read it again, but no bytes are returned.

If this stream acts as a normal file IO stream, how can I seek to the beginning of the stream? seek() does not seem to be a method on the streamingBody object.

The text was updated successfully, but these errors were encountered:

kyleknap · 2016-03-25T16:41:56Z

The class is described here. We will look to see if we can get this ported over or linked in the boto3 docs.

As seen in the docs, if you call read() with no amount specified, you read all of the data. So if you call read() again, you will get no more bytes.

There is also no seek() available on the stream because we are streaming directly from the server. The only way we could add a seek() method is to store all of the data in memory, which is not a great idea as body could be GB's large.

haizaar · 2016-10-27T17:23:19Z

Is there any particular reason that this is still an open ticket?

danielmorozoff · 2017-08-20T00:36:45Z

Is there a reason why the StreamingBody, is not seekable?
This becomes quite problematic when attempting to download portions of large files asynchronously. And what is the recommended way to do this?

robehickman · 2017-12-05T11:43:54Z

@danielmorozoff 'get_object' supports a range parameter.

client.get_object(Bucket=bucket, Key=key, Range='bytes={}-{}'.format(amount_read, amount_read + chunk_size))

alanjds · 2018-04-05T16:22:49Z

One way to allow .seek() is by botocore' StreamingResponse to receive the _raw_stream opener (factory?), not the realized object. Then seeking to 0 would be just restarting the _raw_stream.

See: https://github.com/boto/botocore/blob/master/botocore/response.py#L42

ryanermita · 2018-08-17T06:40:37Z

is there any work around to use seek in StreamBody?

ryanermita · 2018-08-17T08:41:53Z

I solved this by using _raw_stream as per @alanjds comment above.
is this a good solution? or is there a better one?

raw_stream = codecs.getreader('utf-8-sig')(temp_file[u'Body'])._raw_stream.read().decode("UTF8") 
stream_csv = io.StringIO(raw_stream, newline=None)
stream_csv.seek(0)

alanjds · 2018-08-17T11:44:00Z

@ryanermita I was thinking in a way to seek and not putting the whole file in memory.

If you have no problem in filling the memory with the file, a cleaner way is to just StringIO(streaming_body.read()), then seek the StringIO as you are already doing.

ryanermita · 2018-08-17T14:39:26Z

I will try this one, thank you @alanjds 👍

codeman101 · 2019-07-13T01:38:26Z

@kyleknap

Has it been suggested to change the botocore.streambody? I ran into this issue twice. (the second time was because I haven't used read() on the object in a while. Even the documentation you linked to doesn't make it clear to me that the stream gets flushed after the first read. It'd be more intuitive if the stream was copied when read instead of flushed.

cjohnson318 · 2019-10-17T20:11:01Z

I found a solution that worked for me. It involves writing a wrapper that supports seek(). I also read about smart_open in another blog, but I haven't tried it.

ghost · 2022-03-09T19:07:05Z

If someone is having the problem 'bytes' object has no attribute 'seek' I solved with the following:

  obj = s3.get_object(Bucket='Mybucket', Key='MyObjKey')
  body = obj['Body']
  file_like_obj = io.BytesIO(body.read())

awlamb closed this as completed Mar 25, 2016

awlamb reopened this Mar 25, 2016

kyleknap added the documentation This is a problem with documentation. label Mar 25, 2016

awlamb closed this as completed Oct 27, 2016

pradoz mentioned this issue Jul 25, 2022

S3 - Accept Range argument for download_file / download_fileobj methods #3339

Closed

2 tasks

wpfl-dbt mentioned this issue Jul 1, 2024

Remove S3 dependency from chunking process i-dot-ai/redbox#695

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

awlamb commented Mar 25, 2016

kyleknap commented Mar 25, 2016

haizaar commented Oct 27, 2016

danielmorozoff commented Aug 20, 2017 •

edited

Loading

robehickman commented Dec 5, 2017

alanjds commented Apr 5, 2018

ryanermita commented Aug 17, 2018

ryanermita commented Aug 17, 2018

alanjds commented Aug 17, 2018

ryanermita commented Aug 17, 2018

codeman101 commented Jul 13, 2019 •

edited

Loading

cjohnson318 commented Oct 17, 2019

ghost commented Mar 9, 2022

Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

Boto3 S3 StreamingBody().read() reads once and returns nothing after that #564

Comments

awlamb commented Mar 25, 2016

kyleknap commented Mar 25, 2016

haizaar commented Oct 27, 2016

danielmorozoff commented Aug 20, 2017 • edited Loading

robehickman commented Dec 5, 2017

alanjds commented Apr 5, 2018

ryanermita commented Aug 17, 2018

ryanermita commented Aug 17, 2018

alanjds commented Aug 17, 2018

ryanermita commented Aug 17, 2018

codeman101 commented Jul 13, 2019 • edited Loading

cjohnson318 commented Oct 17, 2019

ghost commented Mar 9, 2022

danielmorozoff commented Aug 20, 2017 •

edited

Loading

codeman101 commented Jul 13, 2019 •

edited

Loading