-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]: Create Train and Test Datasets from User-Uploaded Dataset in S3 for /training #913
Comments
Hello @dwu359! Thank you for submitting the Feature Request Form. We appreciate your contribution. 👋 We will look into it and provide a response as soon as possible. To work on this feature request, you can follow these branch setup instructions:
Feel free to make the necessary changes in this branch and submit a pull request when you're ready. Best regards, |
@NMBridges youre doing this task |
@NMBridges My bad, this task should deal with reading the dataset files from s3 into training, not writing files to s3. |
should be the file to implement this endpoint in @NMBridges |
@NMBridges also, assume the scope of this use case to be for tabular (so reading CSV from S3 and then building train/test dataset). See example dataset creator class in the linked file |
Feature Name
Create Train and Test Datasets from S3 for /training
Your Name
Daniel Wu
Description
As of right now, the training backend can only handle default datasets for
/tabular
. Allow user-uploaded datasets to be used for tabular training by implementing a dataset creator intraining/dataset.py
to allow the/tabular
endpoint route to read a file from s3 given the filename and split it into train and test datasets.Right now, datasets are stored in s3 in the
dlp-upload-bucket
in the location {uid}/{trainspace_type}/{filename}.You can upload files to the bucket with
https://em9iri9g4j.execute-api.us-west-2.amazonaws.com/
SST prod endpoint and/datasets/user/{type}/{filename}/presigned_upload_url
route.EDIT: The above statement is not true, see below
You will need a bearer token also, which can be obtained using the backend cli. For more info,
cd training && poetry run python cli.py --help
.The text was updated successfully, but these errors were encountered: