Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Create Train and Test Datasets from User-Uploaded Dataset in S3 for /training #913

Open
dwu359 opened this issue Aug 20, 2023 · 5 comments
Assignees
Labels
backend backend tasks enhancement New feature or request

Comments

@dwu359
Copy link
Contributor

dwu359 commented Aug 20, 2023

Feature Name

Create Train and Test Datasets from S3 for /training

Your Name

Daniel Wu

Description

As of right now, the training backend can only handle default datasets for /tabular. Allow user-uploaded datasets to be used for tabular training by implementing a dataset creator in training/dataset.py to allow the /tabular endpoint route to read a file from s3 given the filename and split it into train and test datasets.

Right now, datasets are stored in s3 in the dlp-upload-bucket in the location {uid}/{trainspace_type}/{filename}.

You can upload files to the bucket with https://em9iri9g4j.execute-api.us-west-2.amazonaws.com/ SST prod endpoint and /datasets/user/{type}/{filename}/presigned_upload_url route.
EDIT: The above statement is not true, see below

You will need a bearer token also, which can be obtained using the backend cli. For more info, cd training && poetry run python cli.py --help.

@dwu359 dwu359 added the enhancement New feature or request label Aug 20, 2023
@github-actions
Copy link
Contributor

Hello @dwu359! Thank you for submitting the Feature Request Form. We appreciate your contribution. 👋

We will look into it and provide a response as soon as possible.

To work on this feature request, you can follow these branch setup instructions:

  1. Checkout the main branch:
```
 git checkout nextjs
```
  1. Pull the latest changes from the remote main branch:
```
 git pull origin nextjs
```
  1. Create a new branch specific to this feature request using the issue number:
```
 git checkout -b feature-913
```

Feel free to make the necessary changes in this branch and submit a pull request when you're ready.

Best regards,
Deep Learning Playground (DLP) Team

@dwu359 dwu359 added the backend backend tasks label Aug 21, 2023
@karkir0003
Copy link
Member

@NMBridges youre doing this task

@dwu359
Copy link
Contributor Author

dwu359 commented Sep 16, 2023

@NMBridges My bad, this task should deal with reading the dataset files from s3 into training, not writing files to s3.

@karkir0003
Copy link
Member

@karkir0003
Copy link
Member

@NMBridges also, assume the scope of this use case to be for tabular (so reading CSV from S3 and then building train/test dataset). See example dataset creator class in the linked file

@karkir0003 karkir0003 moved this from Todo to Review in DLP Project Board Sep 17, 2023
@noah-iversen noah-iversen moved this from Review to Todo in DLP Project Board Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend backend tasks enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

3 participants