Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter problem #175

Closed
amnonkhen opened this issue Feb 12, 2021 · 4 comments
Closed

exporter problem #175

amnonkhen opened this issue Feb 12, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@amnonkhen
Copy link
Contributor

amnonkhen commented Feb 12, 2021

Originally reported by @rays22 on slack.

See summary by @clairerye
Can I summarise what we know so far and what next steps.

  • We had an issue where an old version of the exporter was deployed in prod
  • When we started exporting we were looking to confirm the files where in one location when they were actually in another. But we might also have some other issue with exporting as @ray's project now has the 'archived' status
  • We have deployed the correct version of the exporter on production
  • Marion is going to 'test' export in production a small dataset to check we have fixed the problem
  • We need to 'reset' the (2 or more?) Ray's and Will's projects to valid so they can be re-exported
  • We will then export the remaining projects that are ready for the feb data release
  • We may need to clean up files incorrectly exported to other folders in the staging area
  • We need to have a retro to dive into why this happened
@amnonkhen amnonkhen added the bug Something isn't working label Feb 12, 2021
@amnonkhen amnonkhen added this to the Sprint 10/2/2021 milestone Feb 12, 2021
@amnonkhen
Copy link
Contributor Author

amnonkhen commented Feb 12, 2021

See related PR by @jacobwindsor : ingest-core#59

@amnonkhen
Copy link
Contributor Author

amnonkhen commented Feb 15, 2021

On 15/2 @aaclan-ebi @yusra-haider @jacobwindsor @amnonkhen joined a shared debug/investigate session.

It seems the problem has two parts:

  1. Export taking long because it waiting on something (we don't know on what) and wait times grow exponentially. We were not able to recreate it. When we ran the Export for the same project as Ray had, it completed quickly.
  2. project status does not update to "Exporting" - we were able to track this into a deployment from 29/1 by @aaclan-ebi .

@aaclan-ebi
Copy link

aaclan-ebi commented Feb 15, 2021

Some more notes:

  1. There was an issue in Ray's exporting (2021-02-11 Thu). Why not all things were exported successfully per the old specification?

    a. We lost the logs when we redeployed the new exporter version

    b. We can find the logs if we ssh to the cluster… we can check again later if this happens again

  2. We tried reexporting Ray's exporting using the new version of the exporter and we encountered an issue in checking a custom file object metadata (in Google cloud not HCA) for the project metadata which has already been submitted before

    a. This issue will happen only for additions. Ray's submission contains additional data to an existing project which was already previously submitted in Terra.

    b. The issue is caused by the migration of the DCP2 MVP exported metadata and data to be in their own project subdirectory. The custom metadata of the file objects in the bucket were lost.

    c. We fixed this by just setting the metadata export_completed

    gsutil stat gs://broad-dsp-monster-hca-prod-ebi-storage/prod/455b46e6-d8ea-4611-861e- 
    de720a562ada/metadata/project/455b46e6-d8ea-4611-861e-de720a562ada_2020-07-15T15:56:36.681000Z.json
    
    gsutil setmeta -h "x-goog-meta-export_completed:True" gs://broad-dsp-monster-hca-prod-ebi-storage/prod/455b46e6- 
    d8ea-4611-861e-de720a562ada/metadata/project/455b46e6-d8ea-4611-861e-de720a562ada_2020-07- 
    15T15:56:36.681000Z.json
    

    d. Ray's submission was exported successfully. All data files are added in the /data directory inside that project's directory in the Terra bucket (Note that the issue on setting the state is still there, submission is still stuck in Archived)

    e. We need to bulk apply this for all the migrated DCP2 MVP metadata/data (we could do this a bit later (post feb release) as we don't have other submissions which are additions as far as we know)

@clairerye
Copy link

Investigation complete.
work on #194 and #191

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants