-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Listable resource iteration through Iterator #543
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach; simple and to the point.
The WithLimit
example you include would eventually return nil, nil
even when the inner function keeps returning an error. If used as intended it would be fine, corner case.
One concern here is the use of the outer context for inner calls. Imagine iterating over a very large collection. The last pages still use the context (possibly with timeout) specified at the last call. What do you think of passing this context to the iterator? Then the timeout could apply to any single call as opposed to the end-to-end iteration.
.codegen/api.go.tmpl
Outdated
{{.Comment "// " 80}} | ||
func (a *{{.Service.Name}}API) {{.PascalName}}Iter(ctx context.Context{{if .Request}}, request {{.Request.PascalName}}{{end}}) func () (*{{.Pagination.Entity.PascalName}}, error) { | ||
var err error | ||
finished := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mix of :=
and var foo
. Use of the same style is better, IMO.
Looking at the generated code, I was confused what finished
meant because it doesn't break out. A more apt name would be along the lines of isLastPage
, because results
can still be non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
is uninitialized (gets its zero value) but finished
is "initialized" to false
. I thought it was more clear to set this explicitly this way rather than write var finished bool
. Are you saying it's better to write
var err error
var finished bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of:
var finished = false
(also see other comment on name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was having trouble naming that parameter. isLastPage
is great, I'll use that.
Codecov ReportAttention:
📢 Thoughts on this report? Let us know!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about ‘HasNext() bool’ + ‘Next() (T, error)’? This follows the idiomatic iteration from other SDKs. Client code is fluent enough then - for it.HasNext() { cluster, err := it.Next() if err != nil { return err } … } - technically, HasNext can be (bool, error), but we can hold on the error in the iteration state until Next() (T, error) gets called.
The similar way is used in azure and google SDKs. See azure key vault integration in deco tooling.
passing a closure makes the code less readable.
Limiting is to be done in the user code, if necessary. not sure if we need it in the SDK level.
Perfect solution would require constructing list request type without limit/offset/token/page fields and using that one.
re: backwards compatibility - XxxAll() ([]T, error) must stay and internally call these new methods. “Deprecated …” comment must be added to those.
Comparing to other clouds: Azure: no resource-level iterator, but page-level iterator. Iterators have a type and define methods: (e.g. https://pkg.go.dev/github.com/azure/azure-sdk-for-go@v63.2.0+incompatible/services/databricks/mgmt/2018-04-01/databricks#VirtualNetworkPeeringListPage) GCP: Iterators are a type and define methods: (e.g. https://pkg.go.dev/cloud.google.com/go/dataproc/apiv1#AutoscalingPolicyIterator.Next) AWS: Returned result of a list operation is a page. Users are expected to iterate through the page on their own. They must check IsTruncated to see if there is a next page, and they must copy the next page token to the next request manually. The response includes the current page token as well. (e.g. https://pkg.go.dev/github.com/aws/aws-sdk-go/service/s3#ListBucketAnalyticsConfigurationsInput) For us, it is not possible to write a HasNext() method or something like that without actually fetching the next field. Using a typed response, rather than a closure, gives us more flexibility to add support for HasNext() in the future. Exposing the underlying pages does mean there is slightly more work for the user to iterate through the result, but it also makes it very clear when a request is actually being made to get the next batch of responses. Any approach needs to support all pagination methods we have today, which limits our options given that not all endpoints support page tokens. type ListXResponse {
// .. same fields as before, but also including other fields needed for pagination (not exported)
nextPageToken string
nextOffset string
}
func (*ListXResponse) NextPage(context.Context) (*ListXResponse, error)
// Later, when this can be determined by the response of a list operation
func (*ListXResponse) HasNextPage() bool
...
resp, err := w.Queries.List(ctx)
for len(resp.Values) == 0 {
for _, q := range resp.Values {
// do something
}
resp, err = resp.NextPage(ctx)
} This is backwards-compatible with the current API and simply captures pagination state as unexported fields of the response. |
@mgyucht why
i'd want to block it for as long as possible. pagination implementations will evolve over time: jobs offset/limit -> cursor transition is the most recent example. what is in the page, how many items are in the page should be hidden from the average user. some APIs have no pagination now, but will eventually get it in the near future. hence, consuming iterators/lists must have a consistent UX. so I'll rephrase (reformat) my question again: why can't we queries, err := w.Queries.List(ctx) // makes request, fills iterator state, catches failure. see Java impl
if err != nil {
panic(err)
}
for queries.HasMore() {
query, err := queries.Next()
if err != nil {
panic(err)
}
// do something
} |
.codegen/api.go.tmpl
Outdated
@@ -160,74 +161,86 @@ func (a *{{.Service.Name}}API) {{.PascalName}}AndWait(ctx context.Context{{if .R | |||
} | |||
{{end}}{{if .Pagination}} | |||
{{.Comment "// " 80}} | |||
// | |||
// This method is generated by Databricks SDK Code Generator. | |||
func (a *{{.Service.Name}}API) {{.PascalName}}All(ctx context.Context{{if .Request}}, request {{.Request.PascalName}}{{end}}) ([]{{.Pagination.Entity.PascalName}}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must keep {{.PascalName}}All(...) ([]{{.Pagination.Entity.PascalName}}, error)
method, as there's plenty of codebases that rely on this signature. our interface stability promise doesn't allow changing function signatures. at least without deprecating - see https://stackoverflow.com/questions/7849663/how-do-you-mark-code-as-deprecated-in-go for more info.
{{.PascalName}}All(...) ([]{{.Pagination.Entity.PascalName}}, error)
must stay until v1.0.0
.codegen/api.go.tmpl
Outdated
var totalCount {{template "type" .Pagination.Limit.Entity}} = 0 | ||
{{ end -}} | ||
ctx = useragent.InContext(ctx, "sdk-feature", "pagination") | ||
func (a *{{.Service.Name}}API) {{.PascalName}}Iter({{if .Request}}request {{.Request.PascalName}}{{end}}) func (context.Context) (*{{.Pagination.Entity.PascalName}}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen this pattern (returning a callable function) anywhere in the go ecosystem, and it doesn't feel natural - there's nothing about a loop in the calling code.
How do we cross-compile examples of listing across different languages then? please add a jobs test without NoTranspile
suffix and iterate over a couple of jobs, then regenerate the examples in go SDK. you'll have to modify openapi/code/roll
package for that.
internal/jobs_test.go
Outdated
Limit: 10, | ||
}) | ||
|
||
it := w.Jobs.ListIter(jobs.ListJobsRequest{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need Iter
suffix, if all SDKs already consistently return an iterator without suffixing by convention?
.codegen/api.go.tmpl
Outdated
{{if .Pagination.NeedsOffsetDedupe -}} | ||
// deduplicate items that may have been added during iteration | ||
seen := map[{{template "type" .Pagination.Entity.IdentifierField.Entity}}]bool{} | ||
{{end}}{{if eq .Pagination.Increment 1 -}} | ||
request.{{.Pagination.Offset.PascalName}} = 1 // start iterating from the first page | ||
{{end}}for { | ||
response, err := a.impl.{{.PascalName}}(ctx{{if .Request}}, request{{end}}) | ||
{{end}}return func(ctx context.Context) (*{{.Pagination.Entity.PascalName}}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how would it look like if we port Java SDK's iteration implementation, moving the logic to a common class?
.codegen/api.go.tmpl
Outdated
request = *response.NextPage | ||
} | ||
{{- else if .Pagination.Token -}} | ||
request.{{.Pagination.Token.PollField.PascalName}} = response.{{.Pagination.Token.Bind.PascalName}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're giving pagination a refactoring, then we should add a special request type for listing methods, that removes token
, offset
, and limit
fields - so that the request has only the usable fields.
internal/jobs_test.go
Outdated
job, err := it(ctx) | ||
require.NoError(t, err) | ||
if job == nil { | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.*
halts the test execution, so this statement is unreachable
You need to specify a context when making an API call. What context would you use? The one used to make the original list request itself? That may be unnecessarily restrictive. Only when the request itself includes a next token will we know for certain that there are no more items to be returned by this list call. Java/Python don't have a context object to worry about.
The implementation of
Could you explain this a bit more? This is part of the public API, after all. Note that several APIs allow users to specify page size (like job listing). The only question is whether we should be abstracting over the fact that there may be pages.
I was trying to understand this but wasn't following. From my reading, anemic domain model means that code that acts on objects is decoupled from the object itself and is embedded in the program architecture. In this case, though, the
I agree, and I think this does provide a consistent UX. For non-paginated APIs, the |
b2552e7
to
ea63120
Compare
listing/listing.go
Outdated
getNextPage func(context.Context, Req) (Resp, error), | ||
getItems func(Resp) []T, | ||
getNextReq func(Resp) *Req, | ||
) *iteratorImpl[Req, Resp, T] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return Iterator[T]
, this being a public function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's kind of up to users if they want to define interfaces, right? We define the interface so that we can provide this to NewDedupeIterator, but if we add more public methods to this base type, users should in theory be able to use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a unexported type though, so for documentation sake I guess it would be better to document it as returning an Iterator[T]
. As is, it will show up as unexported type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was thinking about this a bit more, and I think we may want to export this type as well and have the List implementations return the actual concrete type. None of the fields will be exported, so users won't be able to modify iterator state, but e.g. if we define more methods on either of those structures that don't make sense for the overall iterator (e.g. imagine a "GetCached" on DedupeIteratorImpl that returns the cached element or nil), we couldn't expose that without changing the Iterator interface or using the concrete type.
.codegen/api.go.tmpl
Outdated
request.{{.Pagination.Offset.PascalName}} = 1 // start iterating from the first page | ||
{{end}}for { | ||
response, err := a.impl.{{.PascalName}}(ctx{{if .Request}}, request{{end}}) | ||
limit := request.{{.Pagination.Limit.PascalName}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is a bit weird now. Limits should be applied at the iterator level, as these control the per-page limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to preserve the current behavior, do you think I did something wrong?
Eventually, I would like to remove all pagination parameters from request structures so that users don't misuse them or get confused by them anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it persist existing behavior. Just saying there is ambiguity in this approach; i.e. if I specify limit, does it apply to the number of elements per page, or the total number of elements returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's very confusing. I think we made a change earlier to ensure that limit applied to the total returned number of elements when using the SDK, but that diverges from the REST API behavior (as well as from other SDK's behaviors). I do eventually want to remove these parameters from the list request structure, as well as all pagination-based parameters (for example, when using the SDK, why should a user be able to specify the token to use?).
* Implemented Iterator support for paginated endpoints or endpoints returning a list ([#543](#543)). The Iterator interface allows users to get the next resource in the iterator and to check whether another resource is available in the iterator. Iterators for paginated endpoints fetch pages lazily, allowing users to only fetch the pages needed for their use case. * Removed `photon` and `graviton` selectors in `compute.SparkVersionRequ…` ([#622](#622)). Going forward, photon is determined by the `RuntimeEngine` field in `compute.CreateCluster`, and graviton is chosen depending on the `aws_instance_type` field in `compute.CreateCluster`. API Changes: * Added `Attributes`, `Count`, `ExcludedAttributes`, `Filter`, `SortBy`, `SortOrder`, and `StartIndex` fields to [iam.GetAccountUserRequest](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GetAccountUserRequest) and [iam.GetUserRequest](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GetUserRequest). * Added `Schemas` field to [iam.Group](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#Group), [iam.ListGroupsResponse](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListGroupsResponse), [iam.ListServicePrincipalResponse](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListServicePrincipalResponse), [iam.ListUsersResponse](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListUsersResponse), [iam.ServicePrincipal](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ServicePrincipal), and [iam.User](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#User). * Added [iam.GetSortOrder](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GetSortOrder). * Added [iam.GroupSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GroupSchema). * Added [iam.ListResponseSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListResponseSchema). * Added [iam.ServicePrincipalSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ServicePrincipalSchema). * Added [iam.UserSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#UserSchema). * Added `WebhookNotifications` field for [jobs.SubmitTask](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/jobs#SubmitTask). * Added [w.Apps](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/serving#AppsAPI) workspace-level service and related methods. * Added [a.AccountNetworkPolicy](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/settings#AccountNetworkPolicyAPI) account-level service and related methods. Internal SDK Changes: * Update to actions/checkout@v4 ([#650](#650)). * Skip unshallow step in test workflow ([#649](#649)). * Add integration tests for `Jobs`: `ListRuns` ([#645](#645)). * Only log pkg.Load after checking whether the operation is tagged ([#655](#655)). * Prefix library paths with the target directory to construct absolute paths ([#656](#656)). * Fixed HasRequiredNonBodyField method ([#660](#660)). * Added CanSetRequiredFieldsFromJson method for OpenAPI generator ([#661](#661)). * Add integration tests for `ServicePrincipals`: `Patch` ([#662](#662)). * Add integration tests for `Users`: `Patch`, `Update` ([#663](#663)). * Enforce running `az login --service-principal` on nightly runs ([#659](#659)). * Add integration tests for `Connections`: `Create`, `Delete`, `Get`, `List`, `Update` ([#653](#653)). OpenAPI SHA: 5903bb39137fd76ac384b2044e425f9c56840e00, Date: 2023-10-23
* Implemented Iterator support for paginated endpoints or endpoints returning a list ([#543](#543)). The Iterator interface allows users to get the next resource in the iterator and to check whether another resource is available in the iterator. Iterators for paginated endpoints fetch pages lazily, allowing users to only fetch the pages needed for their use case. * Removed `photon` and `graviton` selectors in `compute.SparkVersionRequ…` ([#622](#622)). Going forward, photon is determined by the `RuntimeEngine` field in `compute.CreateCluster`, and graviton is chosen depending on the `aws_instance_type` field in `compute.CreateCluster`. API Changes: * Added `Attributes`, `Count`, `ExcludedAttributes`, `Filter`, `SortBy`, `SortOrder`, and `StartIndex` fields to [iam.GetAccountUserRequest](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GetAccountUserRequest) and [iam.GetUserRequest](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GetUserRequest). * Added `Schemas` field to [iam.Group](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#Group), [iam.ListGroupsResponse](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListGroupsResponse), [iam.ListServicePrincipalResponse](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListServicePrincipalResponse), [iam.ListUsersResponse](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListUsersResponse), [iam.ServicePrincipal](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ServicePrincipal), and [iam.User](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#User). * Added [iam.GetSortOrder](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GetSortOrder). * Added [iam.GroupSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#GroupSchema). * Added [iam.ListResponseSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ListResponseSchema). * Added [iam.ServicePrincipalSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#ServicePrincipalSchema). * Added [iam.UserSchema](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/iam#UserSchema). * Added `WebhookNotifications` field for [jobs.SubmitTask](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/jobs#SubmitTask). * Added [w.Apps](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/serving#AppsAPI) workspace-level service and related methods. * Added [a.AccountNetworkPolicy](https://pkg.go.dev/github.com/databricks/databricks-sdk-go/service/settings#AccountNetworkPolicyAPI) account-level service and related methods. Internal SDK Changes: * Update to actions/checkout@v4 ([#650](#650)). * Skip unshallow step in test workflow ([#649](#649)). * Add integration tests for `Jobs`: `ListRuns` ([#645](#645)). * Only log pkg.Load after checking whether the operation is tagged ([#655](#655)). * Prefix library paths with the target directory to construct absolute paths ([#656](#656)). * Fixed HasRequiredNonBodyField method ([#660](#660)). * Added CanSetRequiredFieldsFromJson method for OpenAPI generator ([#661](#661)). * Add integration tests for `ServicePrincipals`: `Patch` ([#662](#662)). * Add integration tests for `Users`: `Patch`, `Update` ([#663](#663)). * Enforce running `az login --service-principal` on nightly runs ([#659](#659)). * Add integration tests for `Connections`: `Create`, `Delete`, `Get`, `List`, `Update` ([#653](#653)). OpenAPI SHA: 5903bb39137fd76ac384b2044e425f9c56840e00, Date: 2023-10-23
Changes
A prototype of iterable functions for listable resources. The current Go SDK diverges from the Java and Python SDKs in how list operations are handled. The Go SDK synchronously exhaustively lists the resource (based on the given request) and returns a slice of all entries, and "limit" fields are interpreted as a global limit on the number of resources to return (i.e. the length of the returned slice is never longer than limit). In Python and Java, resource listing is done on demand by returning an iterator/generator abstraction from the list API immediately and only listing resources when requested. In these SDKs, the limit field is passed to the API and is only interpreted as the page size, so the number of returned resources is not bounded (though the number returned per REST API call is).
Besides this, in Go, there is not currently a common iterator abstraction. Even in the standard library, there are several different conventions used for iterator-like operations, such as
(Scanner) Scan()
(returning nil when scanning is complete). Without a common iterator interface, range expressions are not supported out of the box. See golang/go#54245 and golang/go#56413 for in-depth discussions about a proposed generic iterator interface in Go.In light of all of this, this PR is a proposal for unifying the approaches taken between the Go and Python/Java SDKs to return an
Iterator
interface from listable APIs. This interface exposesNext(context.Context) (T, error)
andHasNext(context.Context)
methods that make it simple to iterate through the list of resources. This can be used for all endpoints using resource listing, including paginated and non-paginated endpoints alike. This reflects the implementation of the Java SDK and Java iterators, which also havenext()
andhasNext()
methods.Next
returns the next object in the list, or if there are no more objects in the list, it returns the zero value andlisting.ErrNoMoreValues
.HasNext
returns true if and only if there are more items to process, with one caveat: for some APIs (specifically offset-based APIs that don't include any sentinel value), it is not possible to determine whether there is a next item in the list response without actually making an API call. Because of this,HasNext
must take acontext.Context
parameter. In case the API call fails,HasNext
will returntrue
, and the error can be retrieved by callingNext
.If an error occurs while making an API call, that error is cached and no further results are returned or fetched. In this case, the user is expected to call the List API once more to get a fresh iterator.
Other possible idioms
runners.CallersFrames
returns a pointer to a named type*Frames
with a methodNext() (*Frames, bool)
. The boolean parameter indicates if there is another element available after the returned element. This mechanism does not support returning errors, though the error could be added as a third parameter.bufio.Scanner
provides 3 methods:Scan() bool
indicating whether the scan has stopped (i.e. has reached EOF),Text() string
the element returned by the last call toScan()
, andErr() error
the error that occurred during scanning (unless the end of the file was reached). This is a slightly more cumbersome API requiring calls to all three methods separately in each loop, butScan()
can be used in a range expression.database/sql.Rows
provides aNext() bool
to advance the result set to the next row and aScan(dest ...any) error
which scans the row into provided variables, returning an error if one occurred. This is suitable for a very generic endpoint (such as listing the results of a SQL query), but for most REST API calls where the type of the result is well known, it is less natural than simply returning a value.What to do with the existing List API?
This raises a question of how to handle the existing *All methods. We can leave these *All methods in the SDK for convenience and because we have no need to break them now. However, in some sense, they increase the complexity of this project by having two ways to use an iterator, the first being
listing.ToSlice(w.Service.List(ctx))
orlisting.ToSliceN(w.Service.List(ctx), n)
, the second beingw.Service.ListAll(ctx)
. TheListAll
methods have an additional quirk that they use the Limit field in the request as the global limit for responses, which is different from our how other SDKs work. I think we can leave this here safely for now, but when the time comes for v1.0.0, we can remove these methods and encourage users to use the helper methods from thelisting
package directly. I don't feel the urgency to even mark these as deprecated at the moment, so we'll leave them as is.Tests
iteratorImpl
anddedupeIteratorImpl
are unit tested.