-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get Laurenti AdultHemOrgans dataset ready for DCP2 #72
Comments
@rays22 Make sure the dataset does not contain living donors. If so, this dataset cannot be pushed to DCP2 |
I have double checked, and the donors for this dataset are all deceased. |
Jul 1, 2020, 10:22:23 PM 609a2401-246d-470f-a89e-6dfc38f3c7a9 Valid
s3://org-hca-data-archive-upload-prod/609a2401-246d-470f-a89e-6dfc38f3c7a9/.
|
Submitted. |
Need to add ENA accessions. |
This morning I made a submission to project The submission is from upload Jul 1, 2020, 10:22:23 PM and it has status
I need to add the ENA accessions to the metadata. I have tried to upload an update submission:
, but it does seem to have done only a partial update. I can not see any updates using Ingest UI concerning If necessary, I could re-start the project/submission from scratch, but I can not delete submission |
The BioSamples sample_accession - ENA sample (experiment_accession and secondary_sample_accession) associations look incorrect. As a result, the run accessions are not linked to the appropriate samples. |
The remaining tasks will be tracked in #60. |
I have uploaded the updated HCA spreadsheets with ENA accessions to
|
Needs update as defined in #271 |
Update on this project is possible and I believe these changes can be made it the UI and then the project needs re-export using the 'metadata only' flag. Project: AdultHemOrgans:5b5f05b7-2482-468d-b76d-8f68c04a7a47: https://data.humancellatlas.org/explore/projects/455b46e6-d8ea-4611-861e-de720a562ada?catalog=dcp3 project.publications.url: "https://www.biorxiv.org/content/10.1101/2020.01.26.919753v1" updates indicated as per - #260 |
In summary, I can find no evidence for a successful export of the updates and I am stuck. |
No, I am afraid that not solved my problems. There are two issues. The first one is that the Ingest status did change to This morning I re-tried the export after doing some minor project metadata edits in Ingest for the project (just to allow re-export by Ingest) and chose the option @MightyAx , could you kindly double check if there is any evidence of metadata update in the export area that reflects the changes that I made in Ingest UI? Some extra information to help troubleshooting. This project has two previous exports:
The dates might be important for the history of how Ingest exports worked at those times. I am still blocked. |
The exporter logs contain failures for envelopeUuid: 90513d30-5f47-473e-b180-c37a974fc03b which is the first of the two submissions listed chronologically. Example:
Complete log for that instance of ingest-exporter:
|
I'll be working through this example with @aaclan-ebi later. Project UUID: 455b46e6-d8ea-4611-861e-de720a562ada Submission UUIDs: 90513d30-5f47-473e-b180-c37a974fc03b, cb156730-90b0-4b77-944c-bfc263204c61 Export Logs submissionUuid 90513d30-5f47-473e-b180-c37a974fc03b Export Logs submissionUuid cb156730-90b0-4b77-944c-bfc263204c61 |
File might actually be here: |
File is actually there but the metadata for Details here: ebi-ait/dcp-ingest-central#175 (comment) Running this detects the issue:
compared to a recently updated file you can see that
Running the following will fix the issue for this file But there are at least 20 other files for this project and an unknown amount for other projects that also need this fix to ensure exports do not fail in the future. I'm going to open a ticket to write a script to detect and remediate any metadata files that have not been marked as export_complete and where the update time is older that March 1st. |
I am sorry but that does not look like the file I am looking for. I think you might have uncovered some other exporting issues/errors. I had seen 3 project metadata files before I started the export of the latest (two) update attempts:
I can see only the same 3 files after my update and export attempts. I suspect that the metadata file
has been edited/altered by hand after the 2020-07-15 export from Ingest for some reason, because it has content that was not in Ingest nor the spreadsheet that I used originally. Could that manual alteration/hack be the culprit that prevents correct exporting now by causing cloud file metadata inconsistencies? |
I believe the exporter failing means that the exporter is never exporting the "updated" versions of the metadata. I have a fix for the old metadata now, lets reasses the above once that has been run and the exporter jobs complete sucessfully. |
@rays22 Can I update the metadata for the following 192 files so that the exporter knows they have been transferred?
|
I have no reason to object the updating of the file metadata if you think that it is necessary. I admit that it is beyond my understanding of the exporter to tell if there were any reasons/arguments against doing the updates. |
The above files have been updated: example file stat after update:
|
@MightyAx |
The metadata I am referring to is the file metadata stored against the files (whether data files of metadata files) on the terra staging bucket. @rays22 I believe the updates have now all processed but I would appreciate it if you could confirm. |
I can confirm that the project description metadata have been successfully exported to the Terra bucket. I can also see exported |
Hi Ray. I'm really sorry about this, but did you also include the change to the project.publications.url to "https://www.biorxiv.org/content/10.1101/2020.01.26.919753v1" when you made the updates? I had a look at the gsutil bucket and didn't see it there. |
Thanks for spotting my oversight @Wkt8 . I have added the missing |
Dataset/group this task is for:
AdultHemOrgans: Transcriptomic characterisation of haematopoietic stem and progenitor cells from human adult bone marrow, spleen and peripheral blood
This issues is an updated version of https://github.com/HumanCellAtlas/hca-data-wrangling/issues/422
which contains the remaining tasks of https://github.com/HumanCellAtlas/hca-data-wrangling/issues/394 .
Wrangler responsible for this dataset/lab:
Ray
Description of the task:
10x_has_more_than_2_files.adoc; ensure_lane_index.adoc
Production
Acceptance criteria for the task:
The text was updated successfully, but these errors were encountered: