Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated BDC dbGaP IDs to the latest from BDC Gen3 #343

Merged
merged 8 commits into from
Apr 22, 2024

Conversation

gaurav
Copy link
Collaborator

@gaurav gaurav commented Mar 14, 2024

This PR updates the data/bdc_dbgap_ids.csv file with the latest dbGaP identifiers from the BDC Gen3 instance. It also fixes some issues with bin/get_dbgap_data_dicts.py when downloading from FTP:

  1. We used to get the list of files in a directory from FTP, download the files from the corresponding HTTP server, and then try to get another list of files from FTP. But in between the two steps the FTP server times out and disconnects. We now explicitly close the connection after getting the list of files, then open it again before getting the next list of files.
  2. If a download fails, we now try to download the local directory for that variable as it will either be empty or incomplete. Re-running the script causes any variables not already downloaded to be downloaded again.

@gaurav gaurav marked this pull request as ready for review March 14, 2024 21:51
@gaurav gaurav requested a review from YaphetKG March 14, 2024 21:51
@YaphetKG YaphetKG merged commit 01f7df8 into develop Apr 22, 2024
7 of 8 checks passed
@YaphetKG YaphetKG deleted the update-bdc-studies branch April 22, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants