Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow query for totalTiles in MBTiles -> PMTiles conversion #127

Closed
lseelenbinder opened this issue Jan 29, 2024 · 14 comments
Closed

Slow query for totalTiles in MBTiles -> PMTiles conversion #127

lseelenbinder opened this issue Jan 29, 2024 · 14 comments

Comments

@lseelenbinder
Copy link

lseelenbinder commented Jan 29, 2024

We're using pmtiles convert to do some large conversion jobs (including global data). Along the way, I realized that there's a select count(*) from tiles that only exists to produce accurate progress bars.

What would your preferred approach to eliminating this query (it can take 10s of minutes on large enough archives for essentially no real result)? I could see coordinating this with #117 or adding a flag something along the lines of --impreciseProgress. I'm happy to do either, but would like to know what you'd prefer.

@bdon
Copy link
Member

bdon commented Jan 30, 2024

Is running in quiet mode with no logged output acceptable for your use case?

@lseelenbinder
Copy link
Author

As long as the side-effect means the query doesn't happen, it's definitely an improvement over the status quo.

Ideally we'd also have a solution that allows for some progress reporting in CI—but of course we'd face similar problems mentioned in #117 related to thousands of log lines. Ideally, I'd advocate for a more CI-friendly style of progress reporting behind a flag as a good final state.

@mem48
Copy link

mem48 commented Feb 17, 2024

I'm not sure if this is related but I'm experiencing really slow performance on converting mbtiles to pmtiles.

My dataset is a 158 GB .mbtiles containing 2m resolution raster data saved as webp images produced from rio rgbify.

I'm currently on Pass 2: writing tiles with the progress bar saying:

11% | (405094/3638545, 54 it/min) [57h56m55s:989h46m23s]

I know it is a large dataset, but surely it shouldn't take 40 days on a high-spec desktop. The PC doesn't appear to be working hard (10% CPU, 8% Memory, 1% Disk)

@bdon
Copy link
Member

bdon commented Feb 18, 2024

That sounds like a separate issue, can you upload your 158GB .mbtiles somewhere so others can reproduce?

@mem48
Copy link

mem48 commented Feb 18, 2024

@bdon A share link to my mbtiles file https://leeds365-my.sharepoint.com/:f:/g/personal/earmmor_leeds_ac_uk/EnCVP-D0VXtAvHdz14qlI7MBSeX-DSwe_BE3yagjt9yWJg?e=6E2wzK it is still uploading so you will have to wait a while. Currently uploading at 9 MB/s so will take about 5 hours to upload.
Edit: Upload complete and will be available for 30 days

@bdon
Copy link
Member

bdon commented Feb 19, 2024

@mem48 did you try running with convert --no-deduplication?

bdon added a commit that referenced this issue Feb 19, 2024
…127]

* redundant work because getting the count(*) of tiles requires a table scan.
bdon added a commit that referenced this issue Feb 19, 2024
…127] (#142)

* redundant work because getting the count(*) of tiles requires a table scan.
@bdon
Copy link
Member

bdon commented Feb 19, 2024

@mem48 your mbtiles is missing an index:

CREATE UNIQUE INDEX tile_index on tiles (zoom_level, tile_column, tile_row);

@bdon
Copy link
Member

bdon commented Feb 19, 2024

@lseelenbinder are your mbtiles missing the index too? it's not a MUST via the spec but the convert as a whole will be slow without it

@bdon
Copy link
Member

bdon commented Feb 19, 2024

@mem48 on my laptop converting your file with the index added and --no-deduplication took 27 minutes

@lseelenbinder
Copy link
Author

lseelenbinder commented Feb 19, 2024

@lseelenbinder are your mbtiles missing the index too? it's not a MUST via the spec but the convert as a whole will be slow without it

Yes, there's an index, though the mbtiles uses the tiles view with a mapping + data table to deduplicate, so it's a bit slower than a simple setup. It looks like #142 fully fixes this for our purposes though.

@mem48
Copy link

mem48 commented Feb 19, 2024

Thanks @bdon Adding the index fixed my problem and only took 45 minutes even with deduplication. Is it worth a check and warning in go-pmtiles given the enormous effect on performance?

In case anybody else wants to reproduce my fix on Ubunut 22 was:

sudo apt-get install sqlite3 libsqlite3-dev
sqlite3 DSM_England_2m.mbtiles
sqlite> CREATE UNIQUE INDEX tile_index on tiles (zoom_level, tile_column, tile_row);
sqlite> .exit
./pmtiles convert DSM_England_2m.mbtiles DSM_England_2m.pmtiles

Adding an index with rio rgbify is possible but requires using the --co NAME=VALUE argument according to ChatGPT
--co SPATIAL_INDEX=YES or --co INIT_WITH_SQL="CREATE UNIQUE INDEX tile_index on tiles (zoom_level, tile_column, tile_row);"
should work but I have not tested this yet.

@bdon
Copy link
Member

bdon commented Feb 19, 2024

can you find a way to detect the missing index in SQL?

@bdon
Copy link
Member

bdon commented Feb 25, 2024

1.18 removes the query for count because there's not much point in executing the same thing twice (table scan) for two passes. Looks like this is improved if there's a unique index on tiles per the MBTiles spec.

Related to the missing index, we can't make any assumptions about whether the tiles table is a view or a table, either we can find some portable internal SQLite heuristic, or we point out in the docs that your tiles table should be indexed.

@bdon bdon closed this as completed Feb 25, 2024
@voncannon
Copy link

voncannon commented Mar 3, 2024

Appreciate these changes... I don't have timings from before, but it is way way faster working with my last conversion of 300 GB.

Only took 1 hour (macOS M2 Max) writing on an NVMe

(base) bvc@bvc gstore % ls -lh usgstopo.mbtiles
-rw-r--r--@ 1 bvc  staff   293G Feb 29 22:32 usgstopo.mbtiles

(base) bvc@bvc gstore % time ./pmtiles convert usgstopo.mbtiles usgstopo.pmtiles 
2024/02/29 22:41:06 convert.go:260: Pass 1: Assembling TileID set
2024/02/29 22:41:10 convert.go:291: Pass 2: writing tiles
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (20332169/20332169, 5866 it/s)            
2024/02/29 23:38:56 convert.go:345: # of addressed tiles:  20332169
2024/02/29 23:38:56 convert.go:346: # of tile entries (after RLE):  19768407
2024/02/29 23:38:56 convert.go:347: # of tile contents:  19594193
2024/02/29 23:38:59 convert.go:362: Root dir bytes:  9385
2024/02/29 23:38:59 convert.go:363: Leaves dir bytes:  42486391
2024/02/29 23:38:59 convert.go:364: Num leaf dirs:  3501
2024/02/29 23:38:59 convert.go:365: Total dir bytes:  42495776
2024/02/29 23:38:59 convert.go:366: Average leaf dir bytes:  12135
2024/02/29 23:38:59 convert.go:367: Average bytes per addressed tile: 2.09
2024/02/29 23:41:09 convert.go:340: Finished in  1h0m3.023499209s
./pmtiles convert usgstopo.mbtiles usgstopo.pmtiles  572.24s user 611.80s system 32% cpu 1:00:03.89 total

(base) bvc@bvc gstore % ls -lh usgstopo.pmtiles
-rw-r--r--  1 bvc  staff   278G Feb 29 23:41 usgstopo.pmtiles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants