-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GeoJSON to binary arrays improvements #703
Conversation
@ibgreen any feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kylebarron This looks great!
if (feature.properties) { | ||
for (const key in feature.properties) { | ||
const val = feature.properties[key]; | ||
numericProps[key] = numericProps[key] ? isNumeric(val) : numericProps[key]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic may well be correct, but I get confused here. Do we want this to be true only if all the values in all features are numeric? If so maybe the following is easier to follow?
// initalize
for (const key in feature.properties) {propIsNumericKey[key] = true;}
...
// each iteration, check if still considered numeric
if (propIsNumericKey[key]) {
propIsNumericKey[key] = Number.isFinite(val);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be true only if all the values in all features are numeric?
I think so, yes.
If so maybe the following is easier to follow?
That snippet is initializing based on the properties in the first feature? The problem with that is that not all features are required to have all keys. With MVT in particular, if I'm not mistaken, a property could be defined in only a single feature within a layer.
So if we want numericKey
to be true
if the values for that prop in all features are numeric and exist, then initializing based on the properties in the first feature are fine. Otherwise, to initialize based on all properties I think it would be necessary to loop over all features just to do the initialization.
I'm currently interpreting numericKey
should be true
if the values for that prop in all features are either numeric or do not exist. So I need to keep track of 1) have I seen this key in a previous feature, 2) has it been non-numeric in any previous feature.
It might be good to add a comment like
// If property has been numeric in all previous features in which the
// property existed, check if numeric in this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. This can get pretty sophisticated, so you should draw the line here for now and then we can come back and make further improvements if we feel so inclined.
For later:
- An approach used in binary columnar tables is to optionally generate a second
null
array to track which objects are not present (e.g. aUint8Array
). - We could detect the type of the array:
- For each element check if integer. If row is all integer, generate
Uint32Array
- Track max/min values of all values
- if (-32000, 32000), generate
Uint16Array
- If (-128, 127) generate
Uint8Array
- etc
- For each element check if integer. If row is all integer, generate
Maybe create a follow up tasks with such ideas for a later date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can get pretty sophisticated, so you should draw the line here for now and then we can come back and make further improvements if we feel so inclined.
Good idea, it's clear this can be a black hole.
One more idea I thought of that can be a future task: should numericKeys
depend on the geometry type?
For example, if I have a set of GeoJSON features where a numeric property x
only exists on Point
features, and I have 10 Point
features but 1000 Polygon
features, the current code instantiates a Float32Array
for every Polygon vertex, when every vertex is missing data.
I don't think this would necessarily be uncommon either; I have a mountain_peak
layer of points in my MVTs with a numeric elevation
value, but no other layer has that property, so there would be # line positions + # polygon positions
extra empty data just for that property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is a good observation, that can be another optimization.
Right now I think we should be concerned with the "combinatorial explosion" of options + input dependent outputs, specifically relating to the testability and ease-of-use of it all.
Less optionality is probably good for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. It's easy to go on a feature-adding spree. If there's anything you think should be taken out let me know.
Now I'm pivoting to adding more tests... My initial approach would be to add tests to firstPass
and secondPass
separately, to more easily test edge cases. Do you know if it's possible to write tests for a non-exported
function in JS? Or is it ok to export the functions for use in tests? I saw one approach on the web like
export const _testing = [firstPass, secondPass]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I sometimes add a TEST_EXPORTS
: const TEST_EXPORTS = {firstPass, secondPass}
Also by using an object you can then do:
import {TEST_EXPORTS} from ...;
const {firstPass, secondPass} = TEST_EXPORTS;
@kylebarron Just mark the PR as "ready for review" and ping me and I will land. |
@ibgreen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kylebarron This is a spectacular PR. I added a few comments, but will land this now.
I feel that collecting any remaining ideas that came up here in the tracker issue would be good.
Have you tried to build the website and made sure that your new docs show up? I could help with that.
`geojsonToBinary` returns an object containing typed arrays sorted by geometry | ||
type. `positions` corresponds to 2D or 3D coordinates; `objectIds` returns the | ||
index of the vertex in the initial `features` array. | ||
type. `positions` is a flat array of coordinates; `globalFeatureIds` references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For easier reading and a cleaner look, use a markdown table, or at least a bulleted list?
// Array of x, y or x, y, z positions | ||
positions: {value: Float32Array, size: coordLength}, | ||
positions: {value: PositionDataType, size: coordLength}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in the process of introducing typescript defintions for loaders.gl 2.2. With minimal extra effort, we could actually turn what you wrote here into typescript types in a geojson-to-binary.d.ts
. Look for other .d.ts
files to see how it is done.
| PositionDataType | `Float32Array` or `Float64Array` | `Float32Array` | Data type used for positions arrays | | ||
| Option | Type | Default | Description | | ||
| ---------------- | -------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| PositionDataType | `class` | `Float32Array` | Data type used for positions arrays. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me: It is best if the option values are representable in JSON, i.e. numbers, strings, etc. So you could support "float32"
and "float64"
, or positionPrecision: 32 | 64
The reason is that loaders.gl is integrated with deck.gl, and deck.gl exposes loader options in its API, and for instance pydeck and non-JS language bindings go through the deck.gl JSON API intermediary.
} | ||
|
||
export const TEST_EXPORTS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I normally put this last in file
|
||
// Test value equality, missing third dimension imputed as 0 | ||
t.deepEqual(points.positions.value, [100, 0, 1, 100, 0, 0, 101, 1, 0]); | ||
t.deepEqual(lines.positions.value, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use // prettier-ignore
just before line 125 to prevent auto formatting?
@ibgreen
0
is left for the missing valuespoints/lines/polygons.numericProps
object. It was helpful to make it its own object, since I loop over the keys ofnumericProps
a couple times.numericProps = []
.properties
arrays by geometry type. If the user passesnumericProps = []
, all properties are kept as JSON.featureIndex
andglobalFeatureIndex
arrays are created.positions
and numeric feature properties? I.e. if non-zero, fill this as the default value forpositions
when it's initialized, and then I think the third coordinate won't get overwritten when filling a 2D point.Closes #702