Skip to content

Commit

Permalink
fix ancient date format
Browse files Browse the repository at this point in the history
  • Loading branch information
ktmeaton committed Nov 13, 2020
1 parent 1cd4fab commit 66d3ce7
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 1 deletion.
10 changes: 9 additions & 1 deletion docs/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,9 @@ Curate metadata with a DB Browser (SQLite). Examples of modifying the BioSampleC
- Annotate with meaningful metadata.
- Add collection data, geographic location, host.

### Geocode
### Geospatial

#### Geocoding

Geographic location for samples is coded at the level of country and
province/state in the format "Country:Province". Optional sub-province level
Expand Down Expand Up @@ -90,6 +92,12 @@ python workflow/scripts/geocode.py "Armenia:Shirak Province"
> Shirak Province, Armenia
> 40.918594 43.8403536
#### Palladio

```bash
bash workflow/scripts/palladio.sh 2>&1 | tee results/metadata/sra/palladio_ancient.tsv
```

## Genomic Alignment

### Modern Assembly Remote
Expand Down
9 changes: 9 additions & 0 deletions workflow/scripts/geocode.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,16 @@

args = sys.argv
place_name = args[1]
DELIM = "\t"

geolocator = Nominatim(user_agent="plague-phylogeography")
country_address = place_name.split(":")[0]
province_address = ":".join(place_name.split(":")[0:2])

country_name = country_address
province_name = place_name.split(":")[1:2][0]

"""
# Geocode at country level
location = geolocator.geocode(country_address, language="en",)
print(location.address)
Expand All @@ -19,3 +24,7 @@
location = geolocator.geocode(province_address, language="en",)
print(location.address)
print(location.latitude, location.longitude)
"""

location = geolocator.geocode(province_address, language="en",)
print(str(location.latitude) + "," + str(location.longitude))
39 changes: 39 additions & 0 deletions workflow/scripts/palladio.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash

# Output Format
# 1. BioSampleAccession
# 2. Strain
# 3. Country
# 4. Province
# 6. BeginDate
# 7. EndDate
# 8. LatLon

QUERY="SELECT
BioSampleAccession,
BioSampleStrain,
BioSampleGeographicLocation,
BioSampleCollectionDate
FROM
BioSample
WHERE
BioSampleComment LIKE '%KEEP%SRA%Ancient'";

DELIM="\t"

echo -e "Accession"$DELIM"Strain"$DELIM"Country"$DELIM"Province"$DELIM"Begin"$DELIM"End"$DELIM"LatLon"

sqlite3 \
results/sqlite_db/yersinia_pestis_db.sqlite \
"$QUERY" | \
while read line;
do
acc=`echo $line | cut -d "|" -f 1`;
strain=`echo $line | cut -d "|" -f 2`;
country=`echo $line | cut -d "|" -f 3 | cut -d ":" -f 1`;
province=`echo $line | cut -d "|" -f 3 | cut -d ":" -f 2`;
begin=`echo $line | cut -d "|" -f 4 | cut -d ":" -f 1`;
end=`echo $line | cut -d "|" -f 4 | cut -d ":" -f 2`;
latlon=`python workflow/scripts/geocode.py "$country:$province"`;
echo -e $acc$DELIM$strain$DELIM$country$DELIM$province$DELIM$begin$DELIM$end$DELIM$latlon;
done

0 comments on commit 66d3ce7

Please sign in to comment.