Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPLIB-1374 Add tests on all stages #56

Merged
merged 6 commits into from
Jan 31, 2024
Merged

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Jan 24, 2024

Fix PHPLIB-1374

https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/

  • $addFields already done
  • $bucket
  • $bucketAuto
  • $changeStream
  • $changeStreamSplitLargeEvent
  • $collStats
  • $count
  • $densify
  • $documents
  • $facet
  • $fill
  • $geoNear
  • $graphLookup
  • $group
  • $indexStats
  • $limit
  • $listSampledQueries
  • $listSearchIndexes
  • $listSessions
  • $lookup
  • $match
  • $merge
  • $out
  • $planCacheStats
  • $project
  • $redact
  • $replaceRoot
  • $replaceWith
  • $sample
  • $search (no example)
  • $searchMeta (no example)
  • $set
  • $setWindowFields
  • $skip
  • $sort
  • $sortByCount
  • $unionWith
  • $unset
  • $unwind

generator/config/stage/currentOp.yaml Show resolved Hide resolved
Comment on lines +20 to +38
$collStats:
latencyStats:
histograms: true
Copy link
Member Author

@GromNaN GromNaN Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config contains nested object definition. That's why I kept a generic "config" object argument. DX could be improved with advanced type in phpdoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I don't think this stage will be used very often.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it will seldom be used, if at all.

But it seems a bit arbitrary to use config here, as it isn't anything found in the $collStats API. Was it really to much to model the various options like you did with currentOp.yaml?

The options themselves don't seem terribly complex, so I don't see a big benefit to not modeling them unless we're concerned about the server API changing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added the 4 parameters.

Comment on lines +54 to +56
range:
bounds: 'full'
step: 200
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "range" config could have a factory and use and enum for "bounds".

generator/config/stage/densify.yaml Show resolved Hide resolved
time: 1
output:
price:
method: 'linear'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We an create a factory for the output subtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This would look better:

Stage::fill(
    sortBy: object(date: 1),
    output: object(
        score: Fill::locf,
        price: Fill::linear,
    )
);

Copy link
Member

@jmikola jmikola Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alcaeus: That's basically what you did for ODM already, yes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN: Will you be creating a ticket to track addition of an output factory?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +66 to +67
whenMatched: 'replace'
whenNotMatched: 'insert'
Copy link
Member Author

@GromNaN GromNaN Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 enums should be created for this values: WhenMatched<replace|keepExisting|merge|fail> (+ Pipeline allowed) and WhenNotMatched<insert|discard|fail>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just about to leave a comment on # WhenMatched in the option definition above.

Is this todo item ticketed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ticketed under PHPLIB-1381?

@@ -3,26 +3,38 @@ name: $out
link: 'https://www.mongodb.com/docs/manual/reference/operator/aggregation/out/'
type:
- stage
encode: object
encode: single
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to change this type, because an object with only { coll } is not accepted. The value must be a string or an object with { coll, db }

We could have a factory that return a string or an object, that could be used in $merge also.

Copy link
Member

@jmikola jmikola Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN: This reminds me of another operator where you were always encoding the long form (array I think) instead of a conditional short form. Perhaps that was $type? Although I'm not sure if it's worth addressing that since it isn't technically required like it is for $out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another comment, @alcaeus asked about creating custom encoder logic. Is that feasible with the current implementation, or is it more complicated because the entire OutStage file is generated (and thus would require addressing this through the generator code itself).

IIRC, the original intention of this project was to have an ongoing dependency on the generator, so I understand if there's presently no way to maintain stages and operators through PHP along (in an alternate reality where the generator was only used to create the initial implementation).

Asking the above more for my own curiosity.

I understand the current change to use encode: single here is the most flexible and probably sets you up to later use string|OutCollection for the parameter.

Comment on lines +24 to +29
# The $sum expression is always build as an array, even if the value is an array field name
# $sum: '$homework'
$sum: ['$homework']
totalQuiz:
# $sum: '$quiz'
$sum: ['$quiz']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked all this variations from the doc examples.

cumulativeQuantityForState:
$sum: '$quantity'
window:
documents: ['unbounded', 'current']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an enum for unbound / current...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ticketed under PHPLIB-1381?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have this list of enums in expression.php. I don't feel the need to create a ticket for each.

Copy link
Member

@jmikola jmikola Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I missed those @todo comments.

Feel free to resolve any threads where I was asking about tickets, then -- unless there's some other open question.

Comment on lines 31 to 41
array_map(
fn ($year) => Stage::unionWith(
coll: 'sales_' . $year,
pipeline: new Pipeline(
Stage::set(
_id: (string) $year,
),
),
),
range(2018, 2020),
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimentation to use array_map. But array unpacking can't be used in the middle of 2 other positional arguments. So I had to unpack the result of an array_merge.

Copy link
Member Author

@GromNaN GromNaN Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pipeline class could expand lists of provided as argument.

new Pipeline(
    Stage::set(/* ... */),
    array_map(/* ... */),
    Stage::sort(/* ... */),
);

See #57

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, I'd format it manually the way it was done in the second test. I know that's a bit more effort to write, the ...array_merge() logic with array_map is a bit fancy to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reverted.

@GromNaN GromNaN marked this pull request as ready for review January 25, 2024 14:31
Comment on lines -23 to -28
name: timeseries
type:
- object
optional: true
description: |
If set, the aggregation stage will use these options to create or replace a time-series collection in the given namespace.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this argument, since it needs to be inside the coll object. But I could not find an example where it is used. I don't remember where I found it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no idea what this was referring to, as $out only talks about not being able to output to a time-series collection. I don't see any reference to it as an option.

Copy link
Member Author

@GromNaN GromNaN Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jmikola jmikola Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks like a docs omission, so I've opened DOCSP-36120. I'll defer to you if you'd like to re-introduce the timeseries option or defer it for a later time.

I removed this argument, since it needs to be inside the coll object.

Looking at the example you shared above, timeseries is a sibling of coll, not a child. So I see no reason it cannot be modeled in the YAML definition.

default: 'Other'
output:
count:
$sum: 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most examples use: $sum: 1 where $count: {} could be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is left over from times before $count was introduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where $count: {} could be used.

Just to confirm, you're referring to the $count aggregation operator, correct? The syntax is { $count: <string> }, so I wasn't sure if you were just abbreviating or referring to something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmikola the docs you linked to are for the $count aggregation stage. MongoDB 5.0 introduced a $count aggregation accumulator (with the same name to make the confusion perfect) that serves as a shortcut for { $sum: 1 }: $count (aggregation accumulator)

),
Stage::match(
active: false,
transaction: Query::exists(true),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have true as default value. So that it can be written: Query::exists(),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
$pipeline = new Pipeline(
Stage::sort(
object(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $sort operator will need some changes to be variadic object PHPLIB-1269. But this sort specs are shared with other operators as property. So we also need a SortSpec factory that returns an object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that in a number of other examples. Ideally, we would be able to write this like so:

Stage::sort(
    age: -1,
    posts: 1,
);

IIRC there were some other instances where we pass sort specifications, I'll dig into some pipelines to find those occurrences to see if this applies there as well.

generator/config/expression/sum.yaml Show resolved Hide resolved
$pipeline = new Pipeline(
Stage::facet(
...[
'categorizedByYears(Auto)' => new Pipeline(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that this field is the reason for using the splat operator. I think it's fine to leave this in as an example when using this strategy can be helpful instead of fixing the example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a case where I think a comment explaining the necessity of splat is helpful (even if we repeat it throughout tests whenever this comes up).

Comment on lines +45 to +46
new UTCDateTime(new DateTimeImmutable('2021-05-18T00:00:00.000Z')),
new UTCDateTime(new DateTimeImmutable('2021-05-18T08:00:00.000Z')),
Copy link
Member

@alcaeus alcaeus Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that UTCDateTime does not supporting creating a date from a time string. I created PHPC-2352 to track this.

time: 1
output:
price:
method: 'linear'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This would look better:

Stage::fill(
    sortBy: object(date: 1),
    output: object(
        score: Fill::locf,
        price: Fill::linear,
    )
);

Comment on lines +29 to +34
Stage::out(
object(
db: 'reporting',
coll: 'authors',
),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this signature, as passing an object seems overly complex. This is one of those cases where a separate (not auto-generated) encoder would work better, so we can essentially have the following signature:

function out(string $coll, ?string $db = null)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #56 (comment)

tests/Builder/Stage/ProjectStageTest.php Outdated Show resolved Hide resolved
tests/Builder/Stage/SortStageTest.php Show resolved Hide resolved
Comment on lines 31 to 41
array_map(
fn ($year) => Stage::unionWith(
coll: 'sales_' . $year,
pipeline: new Pipeline(
Stage::set(
_id: (string) $year,
),
),
),
range(2018, 2020),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, I'd format it manually the way it was done in the second test. I know that's a bit more effort to write, the ...array_merge() logic with array_map is a bit fancy to understand.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publishing the first round of feedback.

default: 'Other'
output:
count:
$sum: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where $count: {} could be used.

Just to confirm, you're referring to the $count aggregation operator, correct? The syntax is { $count: <string> }, so I wasn't sure if you were just abbreviating or referring to something else.

Comment on lines +20 to +38
$collStats:
latencyStats:
histograms: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it will seldom be used, if at all.

But it seems a bit arbitrary to use config here, as it isn't anything found in the $collStats API. Was it really to much to model the various options like you did with currentOp.yaml?

The options themselves don't seem terribly complex, so I don't see a big benefit to not modeling them unless we're concerned about the server API changing.

-
$densify:
field: 'timestamp'
range:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to model the range option? I see you have object # Range above.

tests/Builder/Stage/DocumentsStageTest.php Show resolved Hide resolved
generator/config/stage/facet.yaml Show resolved Hide resolved
@@ -12,7 +12,7 @@ arguments:
name: into
type:
- string
#- OutCollection
- object # OutCollection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an outstanding ticket to model OutCollection? This may relate to my earlier comment about modeling or typing collection and database names, since libraries may want to hook into that.

generator/config/stage/merge.yaml Show resolved Hide resolved
Comment on lines +66 to +67
whenMatched: 'replace'
whenNotMatched: 'insert'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just about to leave a comment on # WhenMatched in the option definition above.

Is this todo item ticketed?

db: 'reporting',
coll: 'orgArchive',
),
on: ['dept', 'fiscal_year'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm beating a dead horse here, but this left me wondering if some users might try to use fieldPath() methods here. Would that yield an error during static analysis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StringFieldPath is not accepted where string is expected.
image

But phpdoc types are not expressive enough to restrict an array of FieldPath.

image

Also @alcaeus if we want to be able to alias field names, we will need objects similar to FieldPath instead of only raw strings here.

PHPLIB-1382

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to be able to alias field names, we will need objects similar to FieldPath instead of only raw strings here.

Yes, I was thinking the same when I saw the code. We can add that when we look at integrating the builder into ODM itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN feel free to resolve this.

generator/config/stage/merge.yaml Outdated Show resolved Hide resolved
Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finishing with the second half of my review. Aside from individual, open comment threads, OutStage seems to be an outstanding thing to address.

@@ -3,26 +3,38 @@ name: $out
link: 'https://www.mongodb.com/docs/manual/reference/operator/aggregation/out/'
type:
- stage
encode: object
encode: single
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another comment, @alcaeus asked about creating custom encoder logic. Is that feasible with the current implementation, or is it more complicated because the entire OutStage file is generated (and thus would require addressing this through the generator code itself).

IIRC, the original intention of this project was to have an ongoing dependency on the generator, so I understand if there's presently no way to maintain stages and operators through PHP along (in an alternate reality where the generator was only used to create the initial implementation).

Asking the above more for my own curiosity.

I understand the current change to use encode: single here is the most flexible and probably sets you up to later use string|OutCollection for the parameter.

Comment on lines -23 to -28
name: timeseries
type:
- object
optional: true
description: |
If set, the aggregation stage will use these options to create or replace a time-series collection in the given namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no idea what this was referring to, as $out only talks about not being able to output to a time-series collection. I don't see any reference to it as an option.

Comment on lines +49 to +50
...['author.first' => 0],
...['lastModified' => 0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you using the splat operator twice, once for each positional argument? Wouldn't the correct syntax be: ...['author.first' => 0, 'lastModified' => 0]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both work. I used this syntax to keep 1 field per line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. There is seemingly no mention of this within PHP docs (just read through Function Arguments). Tested it myself: https://3v4l.org/sJ5KB#v8.3.2

-
$set:
totalHomework:
# The $sum expression is always build as an array, even if the value is an array field name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The $sum expression is always build as an array, even if the value is an array field name
# The $sum expression is always built as an array, even if the value is an array field name

Alternatively: "The $sum expression always builds as..."

Copy link
Member

@jmikola jmikola Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the type of the referenced field factor in here? I assume ['$homework'] and $homework would still be computed the same and follows the $sum: Array Operand rules. In other words, the array wrapping has no effect other than visually being the long form API.

generator/config/stage/set.yaml Show resolved Hide resolved
generator/config/stage/unionWith.yaml Show resolved Hide resolved
generator/config/stage/unionWith.yaml Outdated Show resolved Hide resolved
$unset:
- 'isbn'
- 'author.first'
- 'copies.warehouse'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thought as #56 (comment) (re: fieldPath() usage), which you noted would be addressed by PHPLIB-1382.

generator/config/stage/merge.yaml Show resolved Hide resolved
Comment on lines +66 to +67
whenMatched: 'replace'
whenNotMatched: 'insert'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be ticketed under PHPLIB-1381?

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN GromNaN merged commit ca173b2 into mongodb:0.1 Jan 31, 2024
5 checks passed
@GromNaN GromNaN deleted the PHPLIB-1374 branch January 31, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants