-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional layer metadata at instantiation #952
Conversation
Also fixed bug where sampling query generation needed results of count queries
phase (not only its tasks) must be executed after the tasks of previous phases
Notes: This allows obtaining additional metadata in map instantiation. When aggregation occurs, the metadata is about the original, unaggregated data source. The metadata can be requested adding a
The requested metadata is returned in the response for each layer ( I've preserved the existing behaviour for |
I'm reviewing this because of a major problem: at the point where metadata is being computed now only the aggregated query (in case of aggregation) is available. Also the previously existing stat Tests for metadata in the presence of aggregation must be added. |
Well, regarding the existing stat So, now we should be free to change the behaviour of this for aggregation and return the estimate count for the pre-aggregation query in this default stat. |
All stats are computed now pre-aggregation Code to help compute post-aggregation stats remains for testing
Also change aggregated stats to not filter a single tile
Remove usage of PhasedExecution This achives better query execution granularity and removes questionable usage of shared results object. It introduces a couple of behavior changes: * estimatedFeatureCount desn't ignore errors now * sample always uses estimatedFeatureCount,even if the actual count is also computed.
Hey @dgaubert can you review this again? There was an interesting problem with the tests (well, "interesting" wasn't the word I had in mind while I was pulling my hair out trying to figure it out). The test "layergroup can hold substitution tokens" was failing but only on travis. (not even using the docker tests locally). I finally was able to reproduce it locally by setting Now, the problem, which has existed for a long time, and has been reveled now, is this: we always compute the row count estimate stat. But this has been failing if the sql query contains Mapnik tokens (because we make no substitution before executing it). I've fixed the substitution problem, but I haven't look at the |
Keep current production behavior of ignoreing errors when computing this stat and returning -1. This is done as to no introduce any instability in production at the moment.
I've reverted the behaviour in case of error when computing @oleurud do you think worth to make that conditional on the environment (so that in development, staging, etc the errors aren't ignored)? ( |
Maybe an environment configuration parameter will be the best option (easy and fast to enable/disable) |
The sampling probability is now being computed using an estimate of the table row count This could led to too high probabilities (to large samples) if the estimate is not accurate. To avoid potential problems with large samples we've added a LIMIT to the sampling queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but let me check again this afternoon (i need to read some things)
if (field.type === 'number') { | ||
return ['min', 'max', 'avg', 'sum']; | ||
} | ||
if (field.type === 'date') { // TODO other types too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a else if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think I've omitted lately quite a few else
s because of return
s inside conditions.
🤔 do you think the explicit else
is preferable for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget if, you are right.
|
||
// columns are returned as an object { columnName1: { type1: ...}, ..} | ||
// for consistency with SQL API | ||
function formatResultFields(dbConnection, flds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be prettier (no required, but it will be easy to understand):
function formatResultFields(dbConnection, fields = []) {
let nfields = {};
for (let field in fields) {
const cname = dbConnection.typeName(field.dataTypeID);
let tname;
if ( ! cname ) {
tname = 'unknown(' + field.dataTypeID + ')';
} else {
tname = fieldType(cname);
}
nfields[field.name] = { type: tname };
}
return nfields;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You catched me copy-pasting from SQL API!! 😊
|
||
// TODO: compute _sample with _featureCount when available | ||
|
||
Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good :)
({ estimatedFeatureCount }) => _sample(ctx, estimatedFeatureCount) | ||
.then(s => mergeResults([s, { estimatedFeatureCount }])) | ||
), | ||
_featureCount(ctx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question to understand it: If some of _featureCount, _aggrFeatureCount, _geometryType or _columns fails, we will return an error as a response. I know that the user must request for it expressly, but I am not sure if metadata should do to fail a request. What is your point of view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a reason to stop the PR and also, I am not saying to change it. Only it raises doubts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, If the user requests for any specific metadata and an error happens, we should return that error because we weren't able to process the request completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have my blessings ;)
Thanks for a very nice issue (and the comments on the PR) allowing everyone to understand very well the use case, the solution, and the caveats
threshold: 1 | ||
}, | ||
metadata: { | ||
aggrFeatureCount: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 is the zoom value??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
} | ||
|
||
testClient.getLayergroup(function(err, layergroup) { | ||
assert.ok(!err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ifError()
. It gives a proper feedback of the actual error. However assert.ok()
wraps the error behind the scenes.
}); | ||
|
||
testClient.getLayergroup(function(err, layergroup) { | ||
assert.ok(!err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ifError()
}); | ||
|
||
testClient.getLayergroup(function(err, layergroup) { | ||
assert.ok(!err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ifError()
}); | ||
|
||
testClient.getLayergroup(function(err, layergroup) { | ||
assert.ok(!err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ifError()
}); | ||
|
||
testClient.getLayergroup(function(err, layergroup) { | ||
assert.ok(!err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ifError()
}); | ||
|
||
testClient.getLayergroup(function(err, layergroup) { | ||
assert.ok(!err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use assert.ifError()
Promise.all([ | ||
_estimatedFeatureCount(ctx).then( | ||
({ estimatedFeatureCount }) => _sample(ctx, estimatedFeatureCount) | ||
.then(s => mergeResults([s, { estimatedFeatureCount }])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 rename s
by sample
@@ -11,17 +34,278 @@ MapnikLayerStats.prototype.is = function (type) { | |||
return this._types[type] ? this._types[type] : false; | |||
}; | |||
|
|||
function queryPromise(dbConnection, query, adaptResults, errorHandler) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a little bit of complication here?
We can simply return a promise:
function queryPromise(dbConnection, query) {
return new Promise((resolve, reject) => {
dbConnection.query(query, (err, res) => err ? reject(err) : resolve(res))
})
}
And then, for each specific metadata function:
function _estimatedFeatureCount (ctx) {
return queryPromise(ctx.dbConnection, _getSQL(ctx, queryUtils.getQueryRowEstimation))
.then(res => ({ estimatedFeatureCount: res.rows[0].rows }))
.catch(() => ({ estimatedFeatureCount: -1 }))
}
Note: Remember, .then
and .catch
return a promise as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we can avoid the ctx
object and pass only the arguments that each metadata function needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed the other issues you mentioned (except the queryPromise refactor), as they were very simple.
Regarding queryPromise, I think you're absolutely right, but I don't want to delay the deploy today, so I'll open a separate PR to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
Please, don't miss that refactor; you'll end up loving promises and their composability!
💋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave some comments that I would like you to take into account.
This is experimental to be used by CartoVL. It shouldn't be public/documented at the moment.
Fixes #948