Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Release v1.32.0 - #minor #517

Merged
merged 55 commits into from
Feb 12, 2025
Merged

New Release v1.32.0 - #minor #517

merged 55 commits into from
Feb 12, 2025

Conversation

boecklic
Copy link
Contributor

This release contains the latest and last functional improvements for stac v1 along with support for the new authentication mechanism.

msom and others added 30 commits January 28, 2025 09:37
Do not login when testing search endpoint
Since we can't easily determine the size of external assets, we write -1
in the `file_size` field of the Asset model. This is to distinguish it
from the assets that are due for uploading
Add filters that help find assets that are either in progress of
uploading or that are missing on s3
Change the implementation of the file size update management command so
that it uses the s3 backend to determine the file size of an object.

This is needed because the file path can actually differ from the asset
name, as is the case with for instance
https://data.geo.admin.ch/api/stac/v0.9/collections/ch.swisstopo.bahnen-winter/items/bahnen-winter/assets/additional-files_bahnen-winter_2056.zip
For the payload validation with a nested validator, we need to use the dedicated get_initial() because initial_data is not set.
This should trigger an upsert but because model Item has `unique_together = (('collection', 'name'),)` we get the following error

django.db.utils.IntegrityError: duplicate key value violates unique constraint "stac_api_item_collection_id_name_78fbc154_uniq"
DETAIL:  Key (collection_id, name)=(1, item-1) already exists.

Unfortunately, bulk_create only lets you define a flat list of unique_fields, so more complex unique constraints cannot be taken into account. Probably this needs a workaround.
The `update_conflicts` parameter of `bulk_create()` does not work with compound unique constraints (`unique_together` in model's `Meta`). So we do it manually ourselves:

1. Check for existing element
2. Update their fields with the values from the payload
3. Call `bulk_update()` on those
4. For the new elements: Call `bulk_create()`

`bulk_update()` requires proper model instances, so I had to somehow convert the `QuerySet` for the existing item into an actual Item. I could not find something less ugly than calling `in_bulk()` and extracting the instance.
After a discussion we decided to keep things simple and tailor the endpoint to the most common use case: Insertion. Should updating at the same time, so upsert, be an actual use case, we can still adapt to it.
There is no guarantee Item.objects.bulk_create() returns the created objects in the same order, so using a dictionary is safer.
This probably needs further changes because we don't want `assets` to be writable in every case.
With the addition of the bulk upload in /collections/{collection_id}/items there is now a POST endpoint.
There is a `Serializer.get_initial()` that would do something similar but not quite. If we use `get_initial()` , the test for the new POST endpoint works but other existing tests fail, like e.g. `ItemsUpdateEndpointTestCase.test_item_endpoint_patch_extra_payload`.

Not fully clear at this point why. Also, it's not clear if this renders the `validate_json_payload` useless for the new serializer (`ItemListSerializer`).

What is clear: `initial_data` is not set for AssetBaseSerializer when calling `ItemListSerializer.is_valid()`. So `validate_json_payload()` fails.

The serializers are connected like this:

   ItemLIstSerializer
   --> ItemSerializer
   --> AssetsForItemSerializer
   --> AssetBaseSerializer
With the addition of the bulk upload in /collections/{collection_id}/items there is now a POST endpoint.
Contrary to the initial plan to make the POST /collections/{collection_id}/items endpoint a place to both create and update ("upsert") items, it is now reduced to only create new items. So we can address the "abstract-method" warning by the linter by just expressing that we don't intend to use that functionality.
There is no guarantee to have always ID "1" for the collection, so we get the ID dynamically instead of hardcoding it.
Originally, 200 was returned for an empty list because technically no Item is created for an empty list.

To make the understanding and handling of the endpoint easier, however, the endpoint now returns 201 just as if an Item had been created.
Originally this was under /collections/{collectionId] but we realized that it actually makes a lot more sense that the POST mirrors the GET of  /collections/{collectionId}/items, So POST is moved there as well.

Also, the docs for the responses are updated to match the implementation.
asteiner-swisstopo and others added 25 commits February 10, 2025 14:58
Just check that it exists and that it is not empty. We actually do something with it only later. To avoid changing the interface later, we already add the header parameter now.
Just check that it exists and that it is not empty. We actually do something with it only later. To avoid changing the interface later, we already add the header parameter now.
Otherwise for bulkCreateItems the assets do not show up in the example payload.

The endpoints making use of itemAssets:

/collections
    GET getCollections
/collections/{collectionId}
    GET describeCollection
    PUT updateCollection
    PATCH partialUpdateCollection
/collections/{collectionId}/items/{featureId}
    GET getFeature
    PUT putFeature
    PATCH patchFeature
    PUT putFeature
/collections/{collectionId}/items
    GET getFeatures
    POST bulkCreateItems
/search
    GET getSearchSTAC
    POST postSearchSTAC

However, this is not visible in the specs because all writing endpoints have hardcoded examples instead of relying on their components. Also, we agreed that being able to write assets also elsewhere is not necessarily a problem.
As there is no update, we can simplify this to just do the same as for the assets.
This has been introduced in PR #511. There is still a failing test:

```
test_items_endpoint_post_returns_400_if_item_exists_already
```

The endpoint now returns a HTML instead of a JSON. The response.content says

> The request's session was deleted before the request completed. The user may have logged out in a concurrent request, for example.

To be investigated.
Removed accidentally on rebasing.
Custom file upload page that uses the multipart upload api to upload files of
up to 5GB directly to S3.
When assets are created via admin GUI there is no file path set. If the file is
then uploaded via API using the multipart upload, the file path will never be
set on the asset. This aligns the admin GUI behaviour with that of creating an
asset via API PUT. It has the downside that newly created assets already have
a file path that is a link to a non-existent file.
Split javascript and css into own file. Locally load store jquery and cryptoJS.
CSS: remove background color from info and error box to improve a11y.
JS: use let over var. Replace jquery ajax with fetch API. Fix deprecated
FileReader method. Fix formatting.

Only show upload large file view for assets that are not external.
Check that variables are displayed in template response.
We want to tweak those but we don't need to hardcode the exact values for all
environments immediately. Another change will update the values in DEV, INT
then PROD. Afterwards we can consider setting these values by default in STAC
itself.

This also removes the specific values we set for the dev build.
```
app/config/settings_prod.py:355:0: C0301: Line too long (101/100) (line-too-long)
```
PB-1440: allow setting SESSION_* settings via environment variables.
@github-actions github-actions bot changed the title new release New Release v1.32.0 - #minor Feb 12, 2025
Copy link
Contributor

@asteiner-swisstopo asteiner-swisstopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@boecklic boecklic merged commit 667194f into master Feb 12, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants