Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore mechanisms for storing, indexing, and retrieving raster data #25

Open
5 of 7 tasks
oubiwann opened this issue Jan 16, 2020 · 6 comments
Open
5 of 7 tasks

Comments

@oubiwann
Copy link
Member

oubiwann commented Jan 16, 2020

Turns out the papers I found were pretty ... non-great. Time to do some more reading.

More readings:

Then sketches (as comments in this ticket or mock-ups in a spreadsheet):

  • explore world indices
  • explore layer indices
  • divide image into tiles
  • explore tile indices
  • explore possible tile data storage (e.g., raw bytes, .tiff sections, bitmaps)
@oubiwann
Copy link
Member Author

@oubiwann
Copy link
Member Author

oubiwann commented Jan 16, 2020

Per the spreadsheet linked above, if we take an image of size 1536 x 1024, this can be broken down into the following:

x y x divisions y divisions tile counts
1536 1024 1 1 1
768 512 2 2 4
384 256 4 4 16
192 128 8 8 64
96 64 16 16 256
48 32 32 32 1024
24 16 64 64 4096
12 8 128 128 16384

tile count relationship to Hilbert curve order:

tile counts hilbert curve order 2^(2 * (order+1))
1 0 1
4 1 4
16 2 16
64 3 64
256 4 256
1024 5 1024
4096 6 4096
16384 7 16384

We're only doing 7 successive tile divisions here (we'd never store the source image, tile 0, so we don't count that), which means a few things:

  1. we only need 3 bits (to represent the max value of "7") to index all tiles, but
  2. we're not making room for a future where we want to support larger source images, and
  3. we can't resolve pixels any finer than in a 12x8 grid

But if we expanded this at both the top and the bottom, we'd leave ourselves enough room for future growth:

x y x divisions y divisions tile counts hilbert curve order
12288 8192 1 1 1 0
6144 4096 2 2 4 1
3072 2048 4 4 16 2
1536 1024 8 8 64 3
768 512 16 16 256 4
384 256 32 32 1024 5
192 128 64 64 4096 6
96 64 128 128 16384 7
48 32 256 256 65536 8
24 16 512 512 262144 9
12 8 1024 1024 1048576 10
6 4 2048 2048 4194304 11
3 2 4096 4096 16777216 12
1.5 1 8192 8192 67108864 13
0.75 0.5 16384 16384 268435456 14

For this level of future-proofing, we'd need 28 bits (to represent the max value of 2^14-1 * 2^14-1), and this would let us index down to the level of individual pixels. Not that we'd ever tile images to that level, but indexing to that level should let us query the database to find which tile has a given pixel coordinate.

@oubiwann
Copy link
Member Author

oubiwann commented Jan 16, 2020

We'll want to try splitting up images into tiles, and then perform queries on pixels inside the tiles to see what the performance profile looks like ... if the tiles are created using RoaringBitmap, that project may already have some data we can use to select optimal tile size ...

So, we're going to need two levels of query here:

  1. get a tile that contains a given point (database query; do we have enough info for this?)
  2. get a pixel out of a given tile (RoaringBitmap query)

If we do this, then ingest will be comprised of:

  1. Splitting an image into an appropriate number of tiles
  2. Reading the split image tile raster data
  3. Populating a RoaringBitmap with the tile raster data
  4. Save the bitmap to the database

@oubiwann
Copy link
Member Author

oubiwann commented Jan 16, 2020

This takes care of the tiling; we'd also want to allocate bits for:

  • which layer (RGB, precipitation, temperature, biomes, altitude, etc.) should be queried
  • which world (hexagram30 will support multiple worlds: either for different games, or for games that will have multiple worlds)

Google's s2 library also allocates a "stop" bit; might be good to borrow that, too.

So, what about:

  • 21 bits - 2^20 worlds (~1 million worlds)
  • 11 bits - 2^10 layers (each world could have ~1000 map layers)
  • 28 bits - indexed Hilbert curve down to (just below) the pixel level
  • 1 bit - stop

Total of 61 bits, pretty close to the s2 lib's 64 bits, but with a fairly different indexing profile and usage ...

@oubiwann
Copy link
Member Author

Would be interesting to see what the storage required would be to support a game that maxed out worlds and layers :-)

@oubiwann
Copy link
Member Author

oubiwann commented Jan 16, 2020

Next steps: walk through some examples of a series of indices for different tiles. Let's walk through the logic of taking a current-resolution image (1600x1022) and splitting it up at different levels:

  • order 1, divided into 4 equal parts (tiles)
  • order 4, divided into 256 tiles
  • order 6, divided into 4096 tiles

Pick a point in the middle of a tile for each of them:

  • Examine the Hilbert curve number for it
  • Examine the quadtree for it
  • Assign an arbitrary world number and layer number
  • Then assemble the entire index number, both in binary and decimal
  • What does querying for a point look like?
  • Can we query for a point lower in resolution than the given order?

Pick a point on the border of a tile, and repeat the same above. Does it still work?

After this, we'll look at querying inside a tile ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant