You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to use open_many(list_of_s3paths) but since it does not pass any other keyword parameters (like fs_options) to the open() function, I wonder why this is not implemented for some reason or was it overlooked. Does it seem feasible to open_many chunks on s3 and have vaex deal with the required caching, etc?
For example, in my attempt to use it, the first filepath in the list is s3://us-east-1-audit-engine-jobs/US/AZ/AZ_Maricopa_20201103/cvr_bif/chunks/CvrExport_0!cvr_bif!chunk_0.csv
But it tries to open it locally.
with open(file, 'rb') as file:
OSError: [Errno 22] Invalid argument:
's3://us-east-1-audit-engine-jobs/US/AZ/AZ_Maricopa_20201103/cvr_bif/chunks/CvrExport_0!cvr_bif!chunk_0.csv'
Is it because this is not feasible, and so that will mean I have to either:
copy all hunks to local file system, which may be infeasible for small cloud instance.
concatenate all chunks to one big csv file on s3.
I was hoping that I could give a list of chunks on s3 and let vaex handing the caching and loading of the chunks.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I tried to use open_many(list_of_s3paths) but since it does not pass any other keyword parameters (like fs_options) to the open() function, I wonder why this is not implemented for some reason or was it overlooked. Does it seem feasible to open_many chunks on s3 and have vaex deal with the required caching, etc?
For example, in my attempt to use it, the first filepath in the list is
s3://us-east-1-audit-engine-jobs/US/AZ/AZ_Maricopa_20201103/cvr_bif/chunks/CvrExport_0!cvr_bif!chunk_0.csv
But it tries to open it locally.
Is it because this is not feasible, and so that will mean I have to either:
I was hoping that I could give a list of chunks on s3 and let vaex handing the caching and loading of the chunks.
Beta Was this translation helpful? Give feedback.
All reactions