Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add v1 enpoint to set Response status #2756

Conversation

gabrielmbmb
Copy link
Member

Description

I've added a new endpoint PUT /api/v1/responses/{response_id}/status that allows setting the status of a Response.

To do so, I've added a new column to the Response model/table called status, which right now has 3 possible values: pending (which is the default value), submitted, and discarded. I've also updated the alembic migrations files to include this new column, which is not nullable (it will have a default value always), and also an index will be created for it, as I suppose that we would like to filter by it and having an index will allow querying using a filter on this column faster.

Regarding the endpoint definition, I've decided to just add a new PUT endpoint to update the status of the Response instead of using the PUT /v1/response/{response_id} endpoint, because I think:

  1. It makes the API's intent more explicit and easier to understand.
  2. It has better separation of concerns as we can more easily optimize and scale this specific operation independently from the update response operation. Also, if in the future is needed to do something after status has been set, it will be easier this way than having an if condition in the update response endpoint that checks if the status has been updated, and if so then do something specific for when the status has changed.

As an alternative to this new endpoint, we could have:

  1. Created two endpoints (PUT /api/v1/responses/{response_id}/submit and PUT /api/v1/responses/{response_id}/discard), one for submitting and one for discarding, but I think this is not a good idea, because if a new possible status is introduced, then a new endpoint has to be created and maintained. The logic of both endpoints will be exactly the same, so we would have two endpoints almost identical. Also, this approach would require the client to send a PUT request to a different endpoint depending on the status, and this can make the API harder to use.
  2. Create one endpoint very similar to the one implemented but passing the status via URL path PUT /api/v1/responses/{response_id}/status/{status}. This approach has the inconvenience that the client would have to send a PUT request to a different endpoint each time the status has to be changed.

Closes #2733

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)
  • Documentation update

How Has This Been Tested

  • I've added unit tests for all the possible scenarios when calling this endpoint.

Checklist

  • I have merged the original branch into my forked branch
  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@gabrielmbmb
Copy link
Member Author

gabrielmbmb commented Apr 23, 2023

A review @frascuchon and @jfcalvo would be much appreciated! I would like to also hear your opinion. Also, I've some questions:

  • Do we need to save the id of the user that set the status for a Response? Right now only the user that created the response or an admin can do it, so I guess is not that important.
  • As only the user that created the response and the admin can set the status for a Response, I guess that the ResponsePolicyV1.update is enough, but in the future if users can set the status for a response that has not been created by them, then I guess that a new policy would be needed.

@jfcalvo jfcalvo self-requested a review April 24, 2023 07:34
@jfcalvo
Copy link
Member

jfcalvo commented Apr 24, 2023

Hi @gabrielmbmb, thanks for the PR. I will take a look to it.

Regarding your questions:

Do we need to save the id of the user that set the status for a Response? Right now only the user that created the response or an admin can do it, so I guess is not that important.

Although is not within the immediate scope of the issue, if you have the time and desire to work on it, you're welcome to do so.

As only the user that created the response and the admin can set the status for a Response, I guess that the ResponsePolicyV1.update is enough, but in the future if users can set the status for a response that has not been created by them, then I guess that a new policy would be needed.

I think it's correct to use ResponsePolicyV1.update for now.

Copy link
Member

@jfcalvo jfcalvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some questions that you can answer at your convenience and also included some suggestions to help improve the solution. Please feel free to respond to the questions whenever you have time.

@@ -36,6 +36,7 @@ def upgrade() -> None:
sa.Column("values", sa.JSON, nullable=False),
sa.Column("record_id", sa.Uuid, sa.ForeignKey("records.id", ondelete="CASCADE"), nullable=False, index=True),
sa.Column("user_id", sa.Uuid, sa.ForeignKey("users.id", ondelete="SET NULL"), index=True),
sa.Column("status", sa.String, nullable=False, index=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain your rationale for modifying this database migration instead of opting to create a new one?

Copy link
Member Author

@gabrielmbmb gabrielmbmb Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the current latest version 1.6.0, only three tables have been created and managed by alembic workspaces, users, and workspaces_users.

In the branch feat/2615-instructions-task, migrations files have been created for some other tables. I've opted to modify the migration file instead of generating a new one because I've assumed that as these changes have not been used yet in any production environment, it is better to modify the migration file directly to add this new column, instead of creating a new migration file for just adding one column. One more file, but a little bit more noise.

Of course, if the migrations files in feat/2615-instruction-task were already applied in any production DB, then the logical thing to do would have been to generate a new migration script to add this new column, so alembic would have detected that there is a revision that has not been applied and that it needs to apply to the DB.

Having that said, for me it was not super clear how are you planning to handle the migrations with alembic. Maybe it's a good idea to have one single alembic revision/script to update the DB per the released version of the package.

Also, I've realized that alembic revision --autogenerate command cannot be used to generate migrations automatically because

target_metadata = None
the target_metadata has not been set to
class Base(DeclarativeBase):
Base.metadata.

Another doubt that I have, is that I've seen that the attributes nullable, index, etc are being set in the migrations files, but not in the mapped_column function of sqlalchemy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I've realized that alembic revision --autogenerate command cannot be used to generate migrations automatically

In your opinion, what are the benefits of utilizing automatic migrations?

Another doubt that I have, is that I've seen that the attributes nullable, index, etc are being set in the migrations files, but not in the mapped_column function of sqlalchemy.

In the same vein as the previous question, what are your views on the advantages of including these attributes in models?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I can answer both questions at the same time 😁

So if we include these attributes in the models, then alembic will be available to automatically pick up them and it will generate a statement in the migration script with all the columns constraints, index, etc.

I think that it's really beneficial to use alembic automatic migrations, because alembic is able to automatically detect the updates you have done in your model class, or in other words, it is able to generate a diff between the schema generated with all the previous revisions and the one you're about to generate, and it will only create statements for the things that need to be updated.

So in my opinion, it is better to let alembic generate the migration file automatically with the SQLAlchemy model metadata than to let a human generate it manually and increase the chances of forgetting to include something in the migration script.

src/argilla/server/models.py Outdated Show resolved Hide resolved
tests/server/api/v1/test_responses.py Show resolved Hide resolved
tests/server/api/v1/test_responses.py Outdated Show resolved Hide resolved

class Response(BaseModel):
id: UUID
values: Dict[str, Any]
record_id: UUID
user_id: UUID
status: ResponseStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In src/argilla/server/schemas/v1/datasets.py, there exists a distinct Response schema. Would you suggest including the status field in this schema as well?

Copy link
Member Author

@gabrielmbmb gabrielmbmb Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize there was this Response schema. Yes, we should include the status field here too, otherwise when calling the listing records /api/v1/datasets/{dataset_id}/records the status for each response won't be there, and the frontend would have to send an additional request to get the status for getting the responses status, which is not ideal.

I think it makes sense to have a ResponseBase class in src/argilla/server/schemas/v1/responses.py, this way we avoid code duplication.

  • We can create a Responseclass inheriting from ResponseBase class with additional attributes that have to be exposed when listing responses (like the record_id) in src/argilla/server/schemas/v1/responses.py.
  • We can create a Response class inheriting from ResponseBase with no additional attributes that can be used when listing records in src/argilla/server/schemas/v1/datasets.py.

@gabrielmbmb gabrielmbmb marked this pull request as ready for review April 25, 2023 08:34
@gabrielmbmb
Copy link
Member Author

hey @jfcalvo, I've changed the status of the PR to ready for review. Is there anything else missing that I should add?

@alvarobartt alvarobartt linked an issue Apr 25, 2023 that may be closed by this pull request
@jfcalvo
Copy link
Member

jfcalvo commented Apr 25, 2023

hey @jfcalvo, I've changed the status of the PR to ready for review. Is there anything else missing that I should add?

Thank you, @gabrielmbmb, for your answers. The PR looks fine, and @frascuchon should be reaching out to you soon for the next steps.

@frascuchon frascuchon deleted the branch argilla-io:feat/2615-instructions-task May 24, 2023 22:36
@frascuchon frascuchon closed this May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow discarding/submitting responses for a record
3 participants