Skip to content

Latest commit

 

History

History
139 lines (115 loc) · 5.36 KB

README.md

File metadata and controls

139 lines (115 loc) · 5.36 KB

S3 GetObject Accelerator

Get large objects from S3 by using parallel byte-range fetches/parts without the AWS SDK to improve performance.

We measured a troughoput of 6.5 Gbit/s on an m5zn.6xlarge in eu-west-1 using this lib with this settings: {concurrency: 64}.

Installation

npm i s3-getobject-accelerator

Examples

Compact

const {createWriteStream} = require('node:fs');
const {pipeline} = require('node:stream');
const {download} = require('s3-getobject-accelerator');

pipeline(
  download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4}).readStream(),
  createWriteStream('/tmp/test'),
  (err) => {
    if (err) {
      console.error('something went wrong', err);
    } else {
      console.log('done');
    }
  }
);

More verbose

Get insights into the part downloads and write to file directly without stream if it is smaller than 1 TiB:

const {createWriteStream} = require('node:fs');
const {download} = require('s3-getobject-accelerator');

const d = download({bucket: 'bucket', key: 'key', version: 'optional version'}, {partSizeInMegabytes: 8, concurrency: 4});

d.on('part:downloading', ({partNo}) => {
  console.log('start downloading part', partNo);
});
d.on('part:downloaded', ({partNo}) => {
  console.log('part downloaded, write to disk next in correct order', partNo);
});
d.on('part:writing', ({partNo}) => {
  console.log('start writing part to disk', partNo);
});
d.on('part:done', ({partNo}) => {
  console.log('part written to disk', partNo);
});

d.meta((err, metadata) => {
  if (err) {
    console.error('something went wrong', err);
  } else {
    if (metadata.lengthInBytes > 1024 * 1024 * 1024 * 1024) {
      console.error('file is larger than 1 TiB');
    } else {
      d.file('/tmp/test', (err) => {
        if (err) {
          console.error('something went wrong', err);
        } else {
          console.log('done');
        }
      });
    }
  }
});

API

download(s3source, options)

  • s3source <Object>
    • bucket <string>
    • key <string>
    • version <string> (optional)
  • options <Object>
    • partSizeInMegabytes <number> (optional, defaults to uploaded part size)
    • concurrency <number>
    • requestTimeoutInMilliseconds <number> Maxium time for a request to complete from start to finish (optional, defaults to 300,000, 0 := no timeout)
    • resolveTimeoutInMilliseconds <number> Maximum time for a DNS query to resolve (optional, defaults to 3,000, 0 := no timeout)
    • connectionTimeoutInMilliseconds <number> Maximum time for a socket to connect (optional, defaults to 3,000, 0 := no timeout)
    • readTimeoutInMilliseconds <number> Maxium time to read the response body (optional, defaults to 300,000, 0 := no timeout)
    • dataTimeoutInMilliseconds <number> Maxium time between two data events while reading the response body (optional, defaults to 3,000, 0 := no timeout)
    • writeTimeoutInMilliseconds <number> Maxium time to write the request body (optional, defaults to 300,000, 0 := no timeout)
    • v2AwsSdkCredentials <AWS.Credentials> (optional)
    • endpointHostname <string> (optional, defaults to s3.${region}.amazonaws.com)
    • agent <https.Agent> (optional)
  • Returns:

AWS credentials & region

AWS credentials are fetched in the following order:

  1. options.v2AwsSdkCredentials
  2. Environment variables
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN (optional)
  1. IMDSv2

AWS region is fetched in the following order:

  1. Environment variable AWS_REGION
  2. IMDSv2

Considerations