Skip to content
vjrj edited this page Oct 2, 2024 · 7 revisions

Some info about deleting data.

Removing records of a data resource from pipelines

There is a job for that in jenkins and a dag for airflow.

If you use HDFS see this job in jenkins that deletes the dr from HDFS so the next time it's not indexed (also you should delete the dr in your collectory).

ALA now uses EMR and airflow so the equivalent job is this.

Removing records of a resource from legacy cassandra & solr (biocache-store)

Imagine that you want to remove the occurrences of a new dr0 you have created for later ingestion. Easy:

biocache delete-records -dr dr0

Images bulk deletion (with image-service >= 1.0.0)

First you need to make a search that fit to what you want to delete, and download the results in CSV.

Later you can delete using the API with commands like:

curl -X DELETE https://images.your.l-a.site/ws/image/5ea4b6ed-7567-4a37-b0eb-5c46daf582e0 -H "apiKey: XXXXX" 
{"success":true,"message":"Image scheduled for deletion."}%   

So to delete all the images.csv downloaded of a search you can use something like:

cat images.csv | awk -F',' '{print $15}' | sed 's/"//g' | sed '1d' | xargs -I ImageId curl -X DELETE https://images.your.l-a.site/ws/image/ImageId -H "apiKey: XXXXX"

Later you have to do an additional step to delete the images on disk in /admin/ > "Tools" > "Purge deleted images".

Images bulk deletion (previous to version 1.0.0)

biocache-store delete-records task doesn't delete media in the image service. Furthermore ingest task, only re-process images that don't exist in the image service. So, sometimes you need to delete images of some resource.

For this, authenticated via CAS and with Admin role you can do advanced searches (for instance images with zero height of some data resource) and select all of them and later delete it.

Advanced search of images

So, the general procedure is, first select images, then go on selected images near the trolley (top right). On this new page you have a button "Admin Functions", then use "Deleted images".

Removing selected images

Sorl documents deletion:

If you want to delete all the documents from a core (for instance from bie-offline), you can use something like: solr-documents-deletetion.png thanks to Jason Loomis for the tip and the screenshot.

How can delete all data from a demo site?

Depending on the use you did of your demo you'll need to at least:

  1. For your collectory service, stop tomcat and mysql, and clear mysql database collectory
  2. Stop solr & clear solr index under /data
  3. Stop cassandra and clear cassandra database (see this task in cassandra ansible role)

Other databases and services to check (image service, species list service, spatial, ...).

Clone this wiki locally