Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize Geodata Schema to avoid repeated lookups of same lat/lon #428

Open
lindner opened this issue Nov 21, 2024 · 0 comments
Open

Normalize Geodata Schema to avoid repeated lookups of same lat/lon #428

lindner opened this issue Nov 21, 2024 · 0 comments

Comments

@lindner
Copy link

lindner commented Nov 21, 2024

Describe the bug
The Points table has denormalized city, country and geodata, normalize this.

Version
0.16.4

To Reproduce
Import lots of Points, notice that reverse Geocoding is taking a long time. Look at schema and data to notice a lot of duplication.

Expected behavior
The schema should be adjusted use a single geodata/city/country for a given lat/lon. Alternatively the ReverseGeocode job could look for existing data from the other Points to see if data has already been fetched.

Additional context
Here are some statistics from my import:

dawarich=# SELECT latitude, longitude, count(*) AS c FROM points pp GROUP BY latitude, longitude ORDER BY c DESC LIMIT 10;

 latitude  |  longitude  |   c
-----------+-------------+-------
 37.834035 | -122.196643 | 72248
 37.833984 | -122.196687 | 21906
 37.834008 | -122.196732 |  5542
 37.834109 | -122.196643 |  4701
 37.834065 | -122.196699 |  3841
 37.834071 | -122.196710 |  3734
 37.834046 | -122.196699 |  3656
 37.834082 | -122.196732 |  3174
 37.833961 | -122.196643 |  2806
 37.832880 | -122.197131 |  2768
(10 rows)

As a workaround you can use this UPDATE sql to populate null fields where the result is already fetched. It processes 1000 item batches, so you just keep running it until you get 0 rows.

WITH t AS (SELECT p1.id, p2.city, p2.country, p2.geodata 
                     FROM points p1 
                     INNER JOIN points p2 ON p1.latitude = p2.latitude AND p1.longitude = p2.longitude 
                             AND p1.city IS NULL AND p2.city IS NOT NULL AND p1.id <> p2.id  LIMIT 1000) 
UPDATE points SET city = t.city, country = t.country, geodata = t.geodata FROM t WHERE points.id = t.id;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant