Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Archipelago as a new space option in the storage service. #672

Merged
merged 6 commits into from
Jan 16, 2024

Conversation

KidsSeeGhosts
Copy link
Contributor

@KidsSeeGhosts KidsSeeGhosts commented Oct 31, 2023

Archipelago is an Open Source Digital Objects Repository based on Drupal. This PR adds the ability to store AIPs from Archivematica in Archipelago using JSON API requests. Archipelago is added as a space option for AIP Storage and DIP Storage in the archivematica storage service.

The archipelago url, username and password are entered when making the space. When Archipelago is selected as the storage destination for an AIP during transfer the following happens:

The DC (Dublin Core) fields and title are extracted from the mets metadata file.
The zip file is sent to archipelago first, so that if it fails, we don't have an incomplete/empty object on archipelago.
If zip file is uploaded successfully, the AIP object is created on archipelago using the title and DC fields to populate the object's metadata in archipelago. All metadata on archipelago is stored in a special strawberry field.

Screenshots:

Create Space in Storage Service, input archipelago url, username and password.
Pasted Graphic 1

Create Location for AIP Storage in the Space.
Pasted Graphic 2

Select new Location as Store AIP location in Dashboard:
Pasted Graphic 3

AIP proccessed in Archipelago with metadata:
image

Example of AIP uploaded to Arcipelago using the storage service location option:

image

@KidsSeeGhosts KidsSeeGhosts marked this pull request as ready for review October 31, 2023 12:15
@Dhwaniartefact
Copy link
Contributor

Dhwaniartefact commented Nov 2, 2023

@KidsSeeGhosts : Thanks for the contribution!

We will surely check it locally. Meanwhile can you please sign Contributor Agreement form and send it to agreement@artefactual.com ? More information on why we require this is in our in progress contribution guidelines on the wiki.

Copy link

@rentonsa rentonsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proof of concept- I was looking to achieve what this PR sets out to: to upload an AIP into Archipelago. I was not attempting to do anything beyond that- if I saw it go through it was enough to confirm Ruairí's PR.

I used a local Archivematica installation on (VirtualBox) Ubuntu 22.0.4 (I installed as root- I had a number of issues getting through the make build process as a local user; I also fell foul of the ElasticSearch container error which comes if the correct vms variable is not set). I also used a local Archipelago installation on the same VM- this I did not set up using root!

Once logged in through test/test, I set up the Archipelago space for AIP storage. Initially, as I was running into an Archipelago installation on the same machine, I set it up at localhost:8001, which due to the docker architecture was incorrect- the correct IP was the machine name (10.0.2.15:8001).

This is the first time I've used this version of Archivematica, so hadn't been questioned during a transfer before. I assume these questions are override-able, but I did nothing to embellish: for example I skipped Transcription and Normalisation, and I couldn't see how to add any metadata to the test I ran (using objects from within the standard testing content).

image

The actual upload went in fine to Archipelago, once the IP was correct.

image

Regarding documentation, I think we maybe just need to clarify in the PR that the screenshot with Store AIP/Store AIP Location is coming from the main dashboard/Administration- I was looking in Storage Service without any luck. We could also share the screenshot Ruairí shared with me offline to show how to set up the AIP content type in Archipelago, which is needed prior to the transfer.

Other than that I think it is all good, and am very pleased to see it works.

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : Can you please make the target branch of this PR to our development branch qa/0.x? Thank you!

@KidsSeeGhosts KidsSeeGhosts changed the base branch from stable/0.21.x to qa/0.x November 24, 2023 17:09
@KidsSeeGhosts
Copy link
Contributor Author

@KidsSeeGhosts : Can you please make the target branch of this PR to our development branch qa/0.x? Thank you!

Yes, done.

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : Based on the results of the tests, it appears that the linting test failed. Would it be possible for you to fix the linting issue ? There is a pre-commit setup you can create. It should be done pre-commit install inside the git repo.

Also are you planning to create a test for this feature ?

Thanks.

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : I appreciate your work fixing the lint test. I would appreciate it if you could create user documentation for the Archipelago configuration. Please share images of Archipelago integration if possible.

Thanks!

@KidsSeeGhosts
Copy link
Contributor Author

KidsSeeGhosts commented Dec 5, 2023

@KidsSeeGhosts : I appreciate your work fixing the lint test. I would appreciate it if you could create user documentation for the Archipelago configuration. Please share images of Archipelago integration if possible.

Thanks!

There's an image of the AIP in Archipelago in the PR description, I've just added another image of the archipelago integration as well. I've just added docstrings for documentation. Happy to write a test but I'm having difficulty running the existing django unit tests. Is there documentation on setting up and running the existing storage service tests with pytest?

Copy link

@uoelddt uoelddt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approve! All good.

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts :

There's an image of the AIP in Archipelago in the PR description, I've just added another image of the archipelago integration as well. I've just added docstrings for documentation.

Thank you for adding docstrings. We expect documentation to mention configuring Archipelago as storage service space. Moreover, it would be great if you could provide some details on how to setup the AIP content type in Archipelago, which is needed before we begin transferring content.

Happy to write a test but I'm having difficulty running the existing django unit tests. Is there documentation on setting up and running the existing storage service tests with pytest?

The existing storage service tests can be run in two different ways.

  1. You can execute "make test-storage-service" into the hack directory from the top directory "Archivematica". Tests will be run using Makefile commands for storage services.

  2. Run ./integration/run.sh from the storage service repository. By doing so, all the tests in the storage service will be verified.

Please feel free to ask me if you have any doubts.
Thank you!

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : Hi, I've seen your commits and run the workflow. It seems that you are not yet finished with the tests.

Please let me know once you finished.

Thanks!

@KidsSeeGhosts
Copy link
Contributor Author

Hi @Dhwaniartefact , I finished writing the unit tests archipelago_test.py the other day, thanks for the advice on running the existing tests. Below is my documentation for configuring the space option. Please let me know if any other changes required :)

Configuring Archipelago as a storage service space:
Archipelago is currently supported for AIP Storage locations. Archipelago is built on the content manaagement system Drupal, which accepts JSON API calls to create new entities. The storage service is able to send AIPs directly to Archipelago this way, creating new entities as an 'AIP' content type.

In Archipelago:
There are plans by Archipelago's developers to add AIP as a default object type for the platform. For now, we can create the AIP content type ourselves in a few steps.

  • In the Manage administrative menu, navigate to Structure > Content types (admin/structure/types). The Content types page appears showing all the available types of content.

  • Click Add content type. Name the content type AIP.

  • Add a single strawberry field only to the content type, and this will complete setup to allow transfers from Archivematica.

  • Ensure your user has write permissions via JSON API.

For more info on creating the content type in Archipelago, see https://www.drupal.org/docs/user_guide/en/structure-content-type.html.

In Storage Service:
AIPs can be sent to an Archipelago instance via JSON API calls. The Archipelago instance URL is required, as well as an Archipelago username and password with write permissions on the platform.

  • When creating the space, the staging path should be set to the default /var/archievmatica/storage_service.
  • When creating the AIP storage location in the space, set the relative path to /.

During the transfer to Archipelago, the AIP file is uploaded first. If file uploads successfully, a new AIP type entity is created in Archipelago. If the AIP file doesn't uploaded successfully, the new AIP entity is not created in Archipelago. The successful entity contains the uploaded file, and the corresponding Dublin Core metadata extracted from the mets.xml file associated with the AIP.

@sromkey
Copy link
Contributor

sromkey commented Dec 12, 2023

Hi @KidsSeeGhosts - any chance you could add the docs as a PR to the Storage Service docs? https://github.com/artefactual/archivematica-storage-service-docs Thanks!

@KidsSeeGhosts
Copy link
Contributor Author

https://github.com/artefactual/archivematica-storage-service-docs

Hi @sromkey, I've tired pushing a branch there over and over just now and keep getting 403. I think my git config is ok locally token wise, does that doc repo allow new branches to be made to contribute? Can't make a PR. Thanks.

@replaceafill
Copy link
Member

replaceafill commented Dec 12, 2023

I've tired pushing a branch there over and over just now and keep getting 403. I think my git config is ok locally token wise, does that doc repo allow new branches to be made to contribute? Can't make a PR.

@KidsSeeGhosts Could you try forking the archivematica-storage-service-docs repository, adding a branch with your changes there and creating the PR from it?

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : Thanks for creating test. As a result of the holiday season in December, we will not be able to contact you until January'2024.

Thank you!

@KidsSeeGhosts
Copy link
Contributor Author

KidsSeeGhosts commented Dec 12, 2023

I've tired pushing a branch there over and over just now and keep getting 403. I think my git config is ok locally token wise, does that doc repo allow new branches to be made to contribute? Can't make a PR.

@KidsSeeGhosts Could you try forking the archivematica-storage-service-docs repository, adding a branch with your changes there and creating the PR from it?

Thanks, I made PR using fork here:
artefactual/archivematica-storage-service-docs#54

Made the PR for the documentation if it could be possibly be looked at before the holiday.

@KidsSeeGhosts
Copy link
Contributor Author

Hi @Dhwaniartefact, I have just returned from holiday break. Before the break I made the documentation PR linked above if you're back to have a look. Thanks!

@Dhwaniartefact
Copy link
Contributor

Hi @KidsSeeGhosts, Hope you had a great Christmas! Also, I just returned from holidays and noticed some commits on this PR. Currently, I am going through it and will keep you posted if any updates are made.

Thank you !

@Dhwaniartefact
Copy link
Contributor

Ensure your user has write permissions via JSON API.

@KidsSeeGhosts : Hi, Can you please share how to ensure user's write permissions via JSON API on Archipelago since I am not able to get it.

Thanks!

@KidsSeeGhosts
Copy link
Contributor Author

KidsSeeGhosts commented Jan 4, 2024

Ensure your user has write permissions via JSON API.

@KidsSeeGhosts : Hi, Can you please share how to ensure user's write permissions via JSON API on Archipelago since I am not able to get it.

Thanks!

@Dhwaniartefact I've added a bit more info to the doc to clarify how to do this on Archipelago:

Ensure your user has write permissions via JSON API. To do this, on Archipelago under Administration > Configuration > JSON:API, tick the box "Accept all JSON:API create, read, update, and delete operations." to allow JSON API writes to be made to archipelago. Specific permissions for the AIP content type can also be set under Administration > Struct > Content types > AIP > Manage permissions, where you can select if only authenticated users or anonymous users can create AIPs.

Commit here: artefactual/archivematica-storage-service-docs@f628431

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : We have verified that we are able to store the AIPs in Archipelago from Archivematica.

However, we discovered that it cannot perform other features such as retrieving, re-ingesting, deleting and integrating the API of the storage service with Archivematica. Are you planning to mention those limitations in the documentation?

Thanks!

@KidsSeeGhosts
Copy link
Contributor Author

KidsSeeGhosts commented Jan 5, 2024

@Dhwaniartefact: I've added a clarifying line to the documentation as below:

Currently Archipelago integration is limited to AIP storage only. Retrieving, re-ingesting, deleting AIPs and integrating the API of the storage service with Archivematica is not yet implemented.
Documentation commit here

Putting the limitations in the doc is a good idea, I then would plan to implement the other functions in a future PR. Thanks :)

@KidsSeeGhosts
Copy link
Contributor Author

@Dhwaniartefact is PR ok to be merged?

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : Here, we have noticed that you have included "DIP Storage" in the location purpose, yet it is somehow left out of both the tests and the documentation. Are you planning to work on "DIP Storage"?

Thanks!

@KidsSeeGhosts
Copy link
Contributor Author

@Dhwaniartefact As this specific PR is just to add Archipelago as an AIP storage option, I have removed reference to DIP in the PR. I'd make a separate PR for DIP Storage after this one is merged.

@Dhwaniartefact
Copy link
Contributor

@KidsSeeGhosts : Additionally, I noticed that this function is not covered in the test. Do you plan to create a test for that?

Thanks!

archipelago_user = models.CharField(
max_length=64,
verbose_name=_("Archipelago username"),
help_text=_("Archiipelago username for authentication"),
Copy link
Contributor

@Dhwaniartefact Dhwaniartefact Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help_text=_("Archiipelago username for authentication"),
help_text=_("Archipelago username for authentication"),

Spelling change

(
"archipelago_user",
models.CharField(
help_text="Archiipelago username for authentication",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help_text="Archiipelago username for authentication",
help_text="Archipelago username for authentication",

Spelling needs to be changed.

"mets": "http://www.loc.gov/METS/",
"dc": "http://purl.org/dc/elements/1.1/",
"dcterms": "http://purl.org/dc/terms/",
}
Copy link
Contributor

@Dhwaniartefact Dhwaniartefact Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespaces are defined here. We would suggest if you could have import it here.

@KidsSeeGhosts
Copy link
Contributor Author

KidsSeeGhosts commented Jan 15, 2024

@Dhwaniartefact I've pushed to import namespace from utils, corrected 2 typos. Regarding testing _get_mets_el, I didn't write a test for it as it was a reused function from the dspace model here. I'm not opposed to writing a test for it though if preferred.

Copy link
Contributor

@Dhwaniartefact Dhwaniartefact left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

@Dhwaniartefact Dhwaniartefact merged commit 6f67c13 into artefactual:qa/0.x Jan 16, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants