-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colossus s3 api #5150
base: master
Are you sure you want to change the base?
colossus s3 api #5150
Conversation
…torage providers-
…rovements made sure that every SyncTask execute internal invocation throws an error to be handled by the top level SyncTask.execute explicitly to guarantee no throw
… and add logic to getFile get file now will try to pipe the stream into the response object with appropriate header settings
every accepted object will be uploaded to he storage bucket if the connection is enabled. Basically the logic stays the same as before for asset in the pending folder it's just the destination that has changed now : either LocalVolume/uploadsDir or storageBucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start.
The awsConnectionHandler implementation needs to be fixed to properly do uploading to the s3 storage.
The integration bash scripts and docker-compose files to run tests need some changes to get them working. I have made local changes on my machine to get tests working. I can open a PR to your branch if you like.
One jest unit test is failing.
Some limitations I see in the current implementation is that only a single object store can be configured at a time. I think that is fine since object stores are more easily scalable and operator should not need to frequently change which bucket to use. It might be practical from an organizational point of view perhaps to be able to specify a subfolder in the s3 bucket to use for storing the dataobjects.
ENABLE_STORAGE_PROVIDER=true | ||
|
||
## Specify the cloud storage provider to use: | ||
CLOUD_STORAGE_PROVIDER_NAME=aws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLE_STORAGE_PROVIDER
and CLOUD_STORAGE_PROVIDER_NAME
probably make more sense as command line arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can reduce from two params to just one, the provider name. If not empty string it means we want to use a cloud storage provider.
@@ -167,6 +172,7 @@ | |||
"ensure": "yarn format && yarn lint --fix && yarn build", | |||
"checks": "tsc --noEmit --pretty && prettier ./src --check && yarn lint", | |||
"start": "./bin/run server", | |||
"test:integration:cloudProvider": "jest --detectOpenHandles './src/services/storageProviders/tests/**/syncService.test.ts'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running the test I got one failed test:
Mokhtars-MBP:storage-node mokhtar$ yarn test:integration:cloudProvider
yarn run v1.22.19
$ jest --detectOpenHandles './src/services/storageProviders/tests/**/syncService.test.ts'
Invalid testPattern ./src/services/storageProviders/tests/**/syncService.test.ts supplied. Running all tests instead.
Invalid testPattern ./src/services/storageProviders/tests/**/syncService.test.ts supplied. Running all tests instead.
PASS src/services/storageProviders/tests/integration/setupLocalCache.test.ts (13.784 s)
PASS src/services/storageProviders/tests/integration/acceptObject.test.ts
FAIL src/services/storageProviders/tests/integration/acceptObjectService.test.ts
● accept pending object flow test suite › accept pending object happy path › should call accept objects/maxTxBatchSize times
expect(jest.fn()).toHaveBeenCalledTimes(expected)
Expected number of calls: 2
Received number of calls: 0
70 |
71 | expect(acceptPendingDataObjectsBatch).toHaveBeenCalledTimes(2)
> 72 | expect(acceptObject).toHaveBeenCalledTimes(2)
| ^
73 | expect(addDataObjectIdToCache).toHaveBeenCalledTimes(2)
74 | })
75 | })
at Object.<anonymous> (src/services/storageProviders/tests/integration/acceptObjectService.test.ts:72:28)
Test Suites: 1 failed, 2 passed, 3 total
Tests: 1 failed, 23 passed, 24 total
docker-compose.localstack.yml
Outdated
- DEFAULT_REGION=${AWS_REGION} | ||
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} | ||
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} | ||
- LOCALSTACK_ENDPOINT=${LOCALSTACK_ENDPOINT} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://docs.localstack.cloud/references/configuration/
default_region is deprecated, there are no key configuration, localstack just ignores secret and key ids for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here I think you meant to configure LOCALSTACK_HOST. LOCALSTACK_ENDPOINT is not a configuration option for localstack anymore
* @returns A promise that resolves to an instance of IConnectionHandler if the provider is enabled, otherwise undefined. | ||
* @throws An error if the storage cloud provider type is invalid or unsupported. | ||
*/ | ||
export async function parseConfigOptionAndBuildConnection(): Promise<IConnectionHandler | undefined> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not need to be an async function.
* @param filePath - The file path of the file to upload. | ||
* @returns A promise that resolves when the upload is complete or rejects with an error. | ||
*/ | ||
uploadFileToRemoteBucket(filename: string, filePath: string): Promise<any> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should define how to handle case when an key already exists on the store.
tests/network-tests/start-storage.sh
Outdated
|
||
echo "Staring storage infrastructure" | ||
export LOCALSTACK_ENABLED=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOCALSTACK_ENABLED is not passed to colossus-1 and colossus-2 in the docker-compose.yml file so they are not using the localstack, and instead public aws. So they fail to access s3 bucket.
Co-authored-by: Mokhtar Naamani <mokhtar.naamani@gmail.com>
Co-authored-by: Mokhtar Naamani <mokhtar.naamani@gmail.com>
Co-authored-by: Mokhtar Naamani <mokhtar.naamani@gmail.com>
Co-authored-by: Mokhtar Naamani <mokhtar.naamani@gmail.com>
Co-authored-by: Mokhtar Naamani <mokhtar.naamani@gmail.com>
…s-fix fix integration tests
The implementation is the same as suggested here:
#4981 (last comment by Mokhtar)
Instead of utilising a package in order to handle different types of cloud provider I have created a provider agnostic interface which integrates in the colossus logic
I believe this will allow us to easily add support for other providers in the future, by using their official sdk.
This also was useful in order to keep the changes to existing colossus logic to a minimal level (as the logic is already complex by itself).
The files
localCache
has been enhanced in order to keep track of which files are on the bucket and which files are on the local volume.Only "accepted" data objects are stored on the bucket