Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ geography request ] - add all GADM data #5654

Closed
dustymc opened this issue Feb 14, 2023 · 34 comments
Closed

[ geography request ] - add all GADM data #5654

dustymc opened this issue Feb 14, 2023 · 34 comments
Labels
Help wanted I have a question on how to use Arctos Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ...

Comments

@dustymc
Copy link
Contributor

dustymc commented Feb 14, 2023

Explain what geography needs created.

All GADM not already in Arctos.

@mkoo will help find a way to address https://github.com/ArctosDB/internal/issues/222, let's just fill in the gaps.

(Please fix whatever I got wrong.)

@sharpphyl
Copy link

If you need help finding the source Wikipedia, there are several countries we need and could prepare that data.

@dustymc
Copy link
Contributor Author

dustymc commented Feb 21, 2023

@sharpphyl I'm not sure how much lookup help I'll need (some I'm sure, but I've been working on my scripts so hopefully it won't be entirely manual), I could definitely use help prioritizing.

@Jegelewicz
Copy link
Member

I could definitely use help prioritizing.

I don't know that I can help here - this seems like something we should have planned to do at the time we made the switch to GADM. Any idea how much time this will take? This could help determine if we should just keep adding things as needed or take a day? week? of @dustymc time to get it done.

Also, how often will this need re-doing? Do we know when GADM updates occur and how are we going to keep up with them?

@sharpphyl
Copy link

Does this issue mean that ultimately we won't need to request the addition of subdivisions that are listed in GADM or should we go ahead and fill out the Geography Request form when needed?

We need provinces in Myanmar and Sri Lanka for catalog records we've already uploaded with incomplete higher geography - so there's no rush but ultimately we want to include the first level administrative unit once it's available.

Myanmar should be fun. Per Wikipedia "Myanmar is divided into twenty-one administrative subdivisions, which include 7 regions, 7 states, 1 union territory, 1 self-administered division, and 5 self-administered zones." Online, GADM lists 15 first-level divisions. Arctos has 11 - a mixture of states, districts, etc. We need Kayah State.

Sri Lanka is also an issue as GADM (online) shows 25 districts as first-level subdivisions and Wikipedia says "Sri Lanka is divided into 9 provinces, which are further subdivided into 25 districts." Do we want to go with Wikpedia's 9 provinces or stick with GADM's 25 districits? We have 11 of the districts in Arctos. We need Saharagamuwa Province although could use one of the two districts in this province if that's what's in Arctos.

I can work up a list of the missing administrative units and file it as an issue under Geography Request or wait until the rest of the administrative units are added per this issue. But I agree that this would be much better done by a Geography committee that can recommend how to deal with countries that don't have straight-forward administrative divisions as other collections may have different priorities and need a different approach from what our collection needs.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 14, 2023

ultimately we won't need to request the addition of subdivisions

Yes.

As usual I'm swamped (but I'm sorta in between waves and can breath for the next 9 seconds or something!) and don't know how to prioritize (but I think #5331 is the biggest squeak at the moment), so I suppose country requests are still the way to go for now. I have got that decently refined (I hope!) so just "Myanmar" is plenty to get started.

don't have straight-forward administrative divisions

This is where our "do what GADM does" policy really shines: we don't have to make any decisions whatsoever, we've already made it. And whatever GADM does, most everybody who does anything spatial will use that anyway so we can still talk to them (if only to grumble about how weird GADM is!).

@mkoo
Copy link
Member

mkoo commented Mar 14, 2023 via email

@dustymc
Copy link
Contributor Author

dustymc commented Mar 15, 2023

processing

???

@mkoo
Copy link
Member

mkoo commented Mar 15, 2023

Oh: processing= download 5 gb geodb file; sort through adm_0,1,2, simplify and send
Berkeleymapper will use for intersections and graphing, probably also subsetting results; maybe spatial selection?
not sure if you want it too but at least can resend Myanmar and Sri Lanka

@dustymc
Copy link
Contributor Author

dustymc commented Mar 15, 2023

Ah, thanks. No, I've got scripts set up to pull straight from GADM.

@mkoo
Copy link
Member

mkoo commented Mar 15, 2023

@dustymc can you reload Myanmar from service? I dont know why Kayah State is missing for Phyllis because it's there in the shapefile (and the other missing states)

Same with reloading Sri Lanka-- if we stick with adm_1 then we should have 25 districts (all included in GADM v4.1)

This was referenced Mar 15, 2023
@dustymc
Copy link
Contributor Author

dustymc commented Mar 15, 2023

from @mkoo in #6007

Can we stop doing this ondemand?

Yep, but priorities. (I think mine is currently ironing out what I can regarding identifiers then #5331 - but #5193 doesn't seem addressed but I can't do it by myself and IDK if folks have checked and like it (doesn't seem possible) or ????????? - pleasepleaseplease redirect me if I'm lost!!)

The biggest of those is #5383 (but that got weird and I think maybe now I'm also expected to clean up the nonsense that's been abandoned in the bulkloader and #5594 isn't getting a response and .... help?)

list of countries

I have scripts, you are VERY welcome to run them, but you need to arrange access to the VMs and a scary postgres password and some understanding of some very specialized 'me-tools.'

check we can run

I'm tempted to propose an issue per country (which could be immediately closed if there are no problems) but ???????? My scripts can't ENTIRELY automate that, but it would not be a difficult thing to produce either.

@Jegelewicz
Copy link
Member

nonsense that's been abandoned in the bulkloader

This doesn't seem like it would be super difficult - just remove "county" or "parish" from any higher geography that includes United States?

@dustymc
Copy link
Contributor Author

dustymc commented Mar 16, 2023

@mkoo - can we concat engtype_1 back in, which will take us back to having Bla County (and such).

redo myanmar and ping issues committee

@dustymc
Copy link
Contributor Author

dustymc commented Mar 23, 2023

@mkoo bringing engtype_1 back in isn't going to work, or at the very least will be inconsistent or complicated.

I pulled a fresh copy of the US, and I get...

arctosprod@arctos>> select gid,gid_0,country from temp_gadm_0;
 gid | gid_0 |    country    
-----+-------+---------------
   1 | USA   | United States

Nice.

arctosprod@arctos>> select gid,gid_1,gid_0,country,name_1,varname_1,nl_name_1,type_1,engtype_1,cc_1,hasc_1,iso_1 from temp_gadm_1;
 gid |  gid_1   | gid_0 |    country    |        name_1        |            varname_1             | nl_name_1 |      type_1      |    engtype_1     | cc_1 | hasc_1 | iso_1 
-----+----------+-------+---------------+----------------------+----------------------------------+-----------+------------------+------------------+------+--------+-------
   1 | USA.1_1  | USA   | United States | Alabama              | AL|Ala.                          | NA        | State            | State            | NA   | US.AL  | US-AL
   2 | USA.2_1  | USA   | United States | Alaska               | AK|Alaska                        | NA        | State            | State            | NA   | US.AK  | US-AK
   3 | USA.3_1  | USA   | United States | Arizona              | AZ|Ariz.                         | NA        | State            | State            | NA   | US.AZ  | US-AZ
   4 | USA.4_1  | USA   | United States | Arkansas             | AR|Ark.                          | NA        | State            | State            | NA   | US.AR  | US-AR
   5 | USA.5_1  | USA   | United States | California           | CA|Calif.                        | NA        | State            | State            | NA   | US.CA  | US-CA
   6 | USA.6_1  | USA   | United States | Colorado             | CO|Colo.                         | NA        | State            | State            | NA   | US.CO  | US-CO
   7 | USA.7_1  | USA   | United States | Connecticut          | CT|Conn.                         | NA        | State            | State            | NA   | US.CT  | US-CT
   8 | USA.8_1  | USA   | United States | Delaware             | DE|Del.                          | NA        | State            | State            | NA   | US.DE  | US-DE
   9 | USA.9_1  | USA   | United States | District of Columbia | DC|D.C.                          | NA        | Federal District | Federal District | NA   | US.DC  | US-DC
  10 | USA.10_1 | USA   | United States | Florida              | FL|Fla.                          | NA        | State            | State            | NA   | US.FL  | US-FL

I think consistency would require "Connecticut State" - somehow I didn't see that coming! "District of Columbia Federal District" is nice through....

Here's what we were hoping for:


arctosprod@arctos>> select gid,gid_2,gid_0,country,gid_1,name_1,nl_name_1,
arctos-> name_2,
arctos-> varname_2,
arctos-> nl_name_2,
arctos-> type_2,
arctos-> engtype_2,
arctos-> cc_2,
arctos-> hasc_2 from temp_gadm_2;
 gid  |    gid_2     | gid_0 |    country    |  gid_1   |        name_1        | nl_name_1 |            name_2            |  varname_2   | nl_name_2 |      type_2      |    engtype_2     | cc_2 |  hasc_2  
------+--------------+-------+---------------+----------+----------------------+-----------+------------------------------+--------------+-----------+------------------+------------------+------+----------
    1 | USA.1.1_1    | USA   | United States | USA.1_1  | Alabama              | NA        | Autauga                      | NA           | NA        | County           | County           | NA   | US.AL.AU
    2 | USA.1.2_1    | USA   | United States | USA.1_1  | Alabama              | NA        | Baldwin                      | NA           | NA        | County           | County           | NA   | US.AL.BD
    3 | USA.1.3_1    | USA   | United States | USA.1_1  | Alabama              | NA        | Barbour                      | NA           | NA        | County           | County           | NA   | US.AL.BR
    4 | USA.1.4_1    | USA   | United States | USA.1_1  | Alabama              | NA        | Bibb                         | NA           | NA        | County           | County           | NA   | US.AL.BI
  701 | USA.15.6_1   | USA   | United States | USA.15_1 | Indiana              | NA        | Boone                        | NA           | NA        | County           | County           | NA   | US.IN.BO
 1194 | USA.21.2_1   | USA   | United States | USA.21_1 | Maryland             | NA        | Anne Arundel                 | NA           | NA        | County           | County           | NA   | US.MD.AN
    5 | USA.

I'm not sure where to go for that. I can't see how this is going to work without us being consistent - enough so that some external user (GBIF or whatever) could follow.

I also noticed the gid in there - are those stable? I think they're missing version, but maybe that's the identifier I've been looking for??

Also, there are a bunch of interesting "counties" in this, and if we're to follow out "do what gadm does" rule (and I think we must if this is going to make sense as scale) then I think we have to include them?? And if we're pulling "County" then we're stuck with "Water body."

85 | USA.2.18_1   | USA   | United States | USA.2_1  | Alaska               | NA        | Northwest Arctic             | NA        | NA        | Borough          | Borough          | NA   | US.AK.NW
   86 | USA.2.19_1   | USA   | United States | USA.2_1  | Alaska               | NA        | Prince of Wales-Outer Ketchi | NA        | NA        | Census Area      | Census Area      | NA   | US.AK.PR
   87 | USA.2.20_1   | USA   | United States | USA.2_1  | Alaska               | NA        | Sitka                        | NA        | NA        | City And Borough | City and Borough | NA   | US.AK.SI
   8
 1274 | USA.23.45_1  | USA   | United States | USA.23_1 | Michigan             | NA        | Lake Michigan                | NA        | NA        | Water body       | Water body       | NA   | US.MI.WB
 3085 | USA.50.34_1  | USA   | United States | USA.50_1 | Wisconsin            | NA        | Lake Michigan                | NA        | NA        | Water body       | Water body       | NA   | US.WI.WB

 1750 | USA.29.1_1   | USA   | United States | USA.29_1 | Nevada               | NA        | Carson City                  | NA        | NA        | Independent City | Independent City | NA   | US.NV.CA

I'll pull some data so we can all see what I see. Can we schedule another task force locality meeting to talk about this?

I'll go make #6051 in some painful and probably inconsistent way....

@dustymc dustymc modified the milestones: Next Task, Needs Discussion Mar 23, 2023
@dustymc dustymc added Help wanted I have a question on how to use Arctos Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ... labels Mar 23, 2023
@dustymc
Copy link
Contributor Author

dustymc commented Mar 23, 2023

https://docs.google.com/spreadsheets/d/1crtMJWnEzHLjyT_GZqrfYpqck9GvhW6T41VptZGjS2Y/edit?usp=sharing is everything GADM knows about USA, minus the geometry, as it makes it into Arctos, in three tabs. The important nodes of that pathway:

wget https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_USA_shp.zip

shp2pgsql gadm41_USA_0.shp .....

(And I think I figured out some of the internal mystery: I changed tools when I got a newer VM, it's bringing in more data - and letting me direct it - than the previous tool was. tl;dr: I can see stuff I previously could not!)

@dustymc
Copy link
Contributor Author

dustymc commented Mar 23, 2023

@ArctosDB/arctos-working-group-officers @ArctosDB/geo-group help!

@Jegelewicz
Copy link
Member

I'm not sure how to help? do we need a quick call?

@dustymc
Copy link
Contributor Author

dustymc commented Mar 23, 2023

IDK? I was thinking scheduling a group meeting, but it'd be pretty nifty if I'm just missing something obvious and someone can tell me how to fix this here so at the very least I can add a missing county without giving myself a bunch of chances to muck it up.

Maybe this is as simple as "ranking" only GADM2 (which will make this only relevant to US and UK at the moment)?? IDK if that's predictable enough to eg pass seamlessly between Arctos and BerkeleyMapper and GBIF, but I could probably keep it straight, and that's something.

??????

@Jegelewicz
Copy link
Member

"ranking" only GADM2

I thought about that - but I feel certain that the "ranks" for some GADM1 stuff will get requested eventually? If I had no input and needed to do this NOW, then "ranking" only GADM2 seems like a viable option. I can't answer the timing question though!

@mkoo
Copy link
Member

mkoo commented Mar 23, 2023

That was my initial thought-- only needed for GADM2 (so only Eng_type2) I dont want to imagine if we started with California State everywhere! That should get us far for now

@dustymc dustymc removed this from the Needs Discussion milestone Mar 24, 2023
@dustymc dustymc added this to the Active Development milestone Mar 24, 2023
@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

only needed for GADM2

Sounds like a proclamation from above to me, going back active.....

@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

My own proclamation: Ignore case. GADM insists https://en.wikipedia.org/wiki/DeSoto_County,_Florida is Desoto, I don't think it's right.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

Our first GADM2 entity that does not more or less mean "county": https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=10020505

@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

Virginia because its weird: virginia.csv.zip

@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

scotland.csv.zip

@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

United States should now be synced with GADM, except a few cases with their own Issues.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 24, 2023

United Kingdom is synced with GADM or has Issues.

@Jegelewicz
Copy link
Member

Our first GADM2 entity that does not more or less mean "county"

This can be cool - but it would also be nice to see United States, Lake Michigan Water body

@dustymc
Copy link
Contributor Author

dustymc commented Mar 27, 2023

would also be nice

There are two ways in which that could happen.

  1. File an issue with GADM, or
  2. File an Arctos issue requesting something which contains that that be elevated to "official Arctos geography"

@sharpphyl
Copy link

United Kingdom is synced with GADM or has Issues.

GADM has 112 subdivisions under England. We have 46. We could use a few more for databasing. Do I need to file an issue or are they in the works?

@dustymc
Copy link
Contributor Author

dustymc commented Mar 27, 2023

England

#6062

@sharpphyl
Copy link

Got it.

@genevieve-anderegg
Copy link

Happy to help also

@dustymc
Copy link
Contributor Author

dustymc commented Apr 4, 2023

Done? There are some issues, #6059, and I'm sure a few things that just got ignored, but Arctos should now very nearly have full global spatial coverage through GAMD and IHO.

@dustymc dustymc closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help wanted I have a question on how to use Arctos Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ...
Projects
None yet
Development

No branches or pull requests

5 participants