You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, this package only directly supports uploads of files from a directory structure.
However, this is limiting for many projects because it may be significantly faster to asynchronously produce results and export them to simple object stores such as Amazon S3.
Furthemore, many tasks that execute are likely mainly database intensive and not cpu intensive. Requiring EC2 nodes or other services that write to a disk is likely an expensive solution when results can easily be unloaded from Databases into object stores in an async manner.
Proposals:
Define interfaces for import of files from S3 buckets/google cloudstore/
Support a load table solution where results can be imported into load tables in databases in a threadsafe manner:
Upload csv objects then copy them to main table one at a time so any race conditions don't lock up tables
Support creating manifests that can be transfered. E.g. results are generated by some analytics package and a json file is created listing the bucket/object store and file reference as well as the result model spec
Support a simple table back end (in lue of a message queue/broker) that stores and logs the state of the results insert
Make a simple Plumber API that lets you initiate an upload from a given manifest (hashed entries to prevent multiple requests with identical uploads)
Cleanup/Garbage collection step: Delete objects from object stores when inserts are successful
Potential Issues:
Storage of keys for buckets
The text was updated successfully, but these errors were encountered:
Currently, this package only directly supports uploads of files from a directory structure.
However, this is limiting for many projects because it may be significantly faster to asynchronously produce results and export them to simple object stores such as Amazon S3.
Furthemore, many tasks that execute are likely mainly database intensive and not cpu intensive. Requiring EC2 nodes or other services that write to a disk is likely an expensive solution when results can easily be unloaded from Databases into object stores in an async manner.
Proposals:
Potential Issues:
The text was updated successfully, but these errors were encountered: