-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go SDK]: MongoDB IO connector #24663
Conversation
Codecov Report
@@ Coverage Diff @@
## master #24663 +/- ##
==========================================
- Coverage 73.35% 73.19% -0.16%
==========================================
Files 719 725 +6
Lines 97137 97643 +506
==========================================
+ Hits 71252 71472 +220
- Misses 24537 24818 +281
- Partials 1348 1353 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
R: @lostluck |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
These are great questions (and another great PR) In the mean time, you may be interested in learning that on the Beam Dev List there's a discussion of a properly documented standard for IOs in Beam, in progress in a Google Doc. While there's no specific content for Go, there are certain Beam structure requirements we should ensure for longevity, that we'd port from relevant Java and Python advice. |
No worries, please take your time. Thanks for sharing, I had a quick look and will have a closer read |
Thanks for your patience! This has finally bubbled to the top of my todo list, and I'll be reviewing it sometime today. |
Thanks! And no worries, I took some holidays myself so I haven't had the opportunity to look much more into it either. I did realize that I had missed to register the element types in the integration test though, and when doing so, I get the tests working with the Flink and Spark runners (resolving the unaddressable byte array issue). I'm using
and
and running the tests with
I'll make a new commit with the type registration. |
Run Go PostCommit |
Run Go Flink ValidatesRunner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some replies to your comments, but as written, this is excellent. I think I only have one note that requires a quick change?
bundled = beam.ParDo(s, newBucketAutoFn(uri, database, collection, option), imp) | ||
} else { | ||
bundled = beam.ParDo(s, newSplitVectorFn(uri, database, collection, option), imp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these are dividing the Reads into separate chunks for smaller reads?
You could improve scalability and possibly get dynamic scaling (on Dataflow) if these are made into the SDF format, generally by pairing the element (in this case, "the concept of a specific Mongo DB Database Collection") with a restriction of some kind (as you've got it.) The key bit is that restrictions technically don't need to be subdividable or numeric, it's just easier that way if that can map cleanly.
Definitely not a blocker for this PR though. https://beam.apache.org/documentation/programming-guide/#splittable-dofns has all the information on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. I would like to give the splittable implementation a try but will need to read up a bit more on how it works so I will open a separate issue for it.
One question though: the splits in this case will be ID ranges/filters and to retrieve those from Mongo we need a context. The operation may also result in an error. Since the SplitRestriction
function doesn't have a context parameter and error return type (until #20607 is fixed), some kind of workaround would be needed.
I see textio.Read
uses a pre DoFn sizeFn
to resolve the context/error access for getting the file size used to create the initial restriction. However in mongodbio.Read
the context is needed for getting the splits (what bucketAutoFn
and splitVectorFn
do now).
Some less ideal approaches I can think of are attaching a context to the DoFn struct or using a context.Background. And to panic instead of returning an error if something goes wrong.
Do you have a better approach in mind or what would be your recommendation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a context.Background and panicking are probably the best options currently (outside of fixing #20607) Definitely the background one vs storing the context in struct.
Panics are recovered from, turned into bundle failures, and the execution state garbage collected.
Ideally, it wouldn't panic, but instead simply return an unsplit restriction if it's not hugely critical, or if the error is retriable, retried there.
Issue #20607 does have the appropriate instructions, and links to the code to look at. It's mostly a matter of adding the optional parameters to a given method, and repeatedly running the execution and fixing the problems along the way (while adding test cases as one goes...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks!
Run Go Spark ValidatesRunner |
Run Go Samza ValidatesRunner |
Thank you for reviewing and for taking the time to answer my questions! As mentioned I'll attempt the splittable DoFn implementation in a separate branch so feel free to merge this one in if you're happy with the changes. |
Thank you for another excellent contribution! |
* DebeziumIO schema transform (#24629) * prototyping a debezium-based schema transform for continuous ingestion of change streams from databases * Writing a significant integration test for DebeziumIO SchemaTransformProvider * Improvements to debuggability, performance on expansion * fix tests. mysql integration test remaining * fixing and improving mysql test * Fix JdbcIO issue * add integration tests and reduce size for now * fixup * passing tests * Address comments * add comment * jamm on -javaagent instead of classpath * [Website] update links with 404 status code * [Website] update links from absolute to relative in md files * Bump actions/setup-java from 3.6.0 to 3.8.0 (#24559) Bumps [actions/setup-java](https://github.com/actions/setup-java) from 3.6.0 to 3.8.0. - [Release notes](https://github.com/actions/setup-java/releases) - [Commits](https://github.com/actions/setup-java/compare/v3.6.0...v3.8.0) --- updated-dependencies: - dependency-name: actions/setup-java dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix some typos (#24736) * Fix some typos * Fix write config * Adding Setup python to Dataflow job in Java Tests (#24712) * Adding Setup python to Dataflow job * Adding Setup python to Dataflow job * Using standard setup-self-hosted composite action for installations Co-authored-by: Elias Segundo <elias.segundo@luisrazo.local> Co-authored-by: Danny McCormick <dannymccormick@google.com> * Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#24740) Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.11.44 to 1.11.46. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/feature/s3/manager/v1.11.44...feature/s3/manager/v1.11.46) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/manager dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Refactor focusing to contextLine (#24674) * Refactor focusing to contextLine (#24613) * Rename a widget (#24613) * Minor reordering (#24613) * Fix after review (#24613) * Support SqlTypes Date in AvroUtils (#24756) * Fix SingleStoreIO performance test job (#24753) * Fix a small wrong url link in notebook (#24746) * Fix POM of beam-sdks-java-core (closes #24675) * Exclude MultimapState related tests in Dataflow legacy worker validates runner tests (#24758) * [Website] delete 404 links #24745 (#24744) * [Release Tasks Migration] - Git Tag Workflow (#24418) * worflow for git tag final release * modify workflow and descriptions * modify git tag command * add git config * update CI doc * Change Point Analysis (#23931) Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com> Fixes https://github.com/apache/beam/issues/23930 * change examples to licensed ones with mentions of autors and licenses (#24762) Co-authored-by: Philippe Moussalli <philippe@moussalli@ml6.eu> * Implement PubsubWriteSchemaTransformProvider (#24443) * Implement PubsubWriteSchemaTransformProvider * Patch error message for format * Patch method comment * Fix identifiers * spotlessApply * [CdapIO] Complete examples for CDAP Hubspot plugins (#24568) * Add examples for Cdap Hubspot plugins * Fix dependencies * Move Cdap Hubspot examples to the separate gradle module * Move common classes to Examples Cdap module * Fix readme * Refactoring * Fix typo * Fix Permission denied error build beam locally (#24766) * fix typo: UnmarshallWindowFn should be UnmarshalWindowFn (#24771) * Remove slow review label after processing commands to avoid race (#24765) * [#24292] Create Avro extension for Java SDK * Address the review comments * Apply #24454 changes * [BigQueryIO] Don't update schema of destination table when no schema is provided (#24700) * skip schema update when no schema is provided * test for using create_if_needed disposition * spotless * [Playground] Python multifile examples (#24751) * proto * gen * models * with multifile * models * examples * buf.gen.yaml+gen * buf.gen.yaml * gen * files_content * backend_run_code * infra_datastore * fix * GetPrecompiledObjectCode * ignore_incomplete * build * user_score.py * rename * no java * user_score * fix Co-authored-by: Alexey Inkin <leha@inkin.ru> * [CdapIO] Complete examples for CDAP Salesforce plugins (#24567) * Add examples for Cdap Salesforce plugins * Move Cdap Salesforce examples to the separate gradle module * Move common classes to Examples Cdap module * Fix readme * Rename package-info file and rename gradle tasks * Fix checkstyle * Make go vet and go staticcheck output visible in CI (#24786) * Make go vet and go staticcheck output visible in CI * Intentionally fail staticcheck * Actually violate staticcheck * Actually violate staticcheck * Fix staticcheck errors * [Go SDK] Fix multimap support for the direct runner (#24775) Updates the `NewKeyedIterable()` implementation to correctly parse a multimap side input from an in memory ReStream. * [#24515] Ensure that Go pipelines on Dataflow are not allowed to opt out of runner v2. (#24767) * [#24515] Ensure that Go pipelines on Dataflow are not allowed to opt out of runner v2. This is towards #24515 * Drop the unused worker jar and update breaking changes. * Update Java portable container for Dataflow (#24791) * [BigQuery] Extend timestamp precision to microseconds when writing with Beam Rows (Storage API) (#24784) * [Website] update check links, catch prod & staging links * [Spark Dataset runner] Trigger evaluation using write noop rather than using foreach (closes #24797) (#24565) * Install sklearn < 1.20.0 for sklearn postcommit tests (#24788) * Install sklearn < 1.20.0 * fixup syntax * [Spark Dataset runner] Skip unconsumed additional outputs of ParDo.MultiOutput to avoid caching if not necessary (resolves #24710) (#24711) * [Go SDK]: MongoDB IO connector (#24663) * Better naming and inference for PythonCallable. (#24735) This is particularly valuable for cross-language calls. * [Spark Dataset runner] Reuse SparkSession when testing using class rule. (#24812) * [#24515] Ensure that portable Java pipelines on Dataflow are not able to opt out of runner v2. (#24805) Towards #24515 * 24802 fix missing test resources (#24803) * Remove use of test resource files * Patch method comment * Add Python xlang KafkaIO performance test (#24633) * Add Python xlang KafkaIO performance test * Use SDF read for performance test in both Java and Python * Fix kafka run out of resource upon increase number of records for streaming test * Fix assert expected and actual misplace * Fix Java test fail whenever a write bundle fail * Fix test workaound (see PR comment) * Shard python precommit (#24204) * WIP: Shard python precommit * Naming * Formatting * Allow empty test suites * Fix no posargs case * Fix paths * Split out integration tests * Do common tasks in IT suite * Fix name base * More splitting * Fix quotes * try ignore-glob * Add groovy files per subfolder * name typo * Filter dataframes tests * Logging * Switch to correct directory syntax * Merge master + fix paths * Consolidate back to reasonable set * Add jobs to README * Try full quotes * Add in missing tests * Move arg lines down * Add gradle files for typescript containers and update the version info. Container are now built with ./gradlew ":sdks:typescript:container:docker" just like all other SDKs and they will be pushed as part of the release process. * Refactor integration tests to use internal containers package (#24823) * Various cleanups to the typescript SDK: * Better errors for bad serialization imports. * Semver-correct version for fake worker package. * Package source into temporary directory. * Allow "latest" as a xlang beam jar version for non-released SDKs. * Less verbose logging. * Fix issue with typescript sibling sdks. * [#24789][Go SDK] Fix Minor race conditions (#24808) * Add timeout to cross-language tests. * npm install before build * Format go file. * [#24789] Spot fix fullvalue wrapping for SDF. (#24826) * [#24789] Spot fix fullvalue wrapping for SDF. * Fix indenting on diagrams * Reinstate comment * Add thread prefix for StreamingDataflowWorker work executor * Fix Cassandra read bug when user query has no where clause (fixes #24829) (#24830) * Fix Cassandra read bug when user query has no where clause * update changelog * fix format violation * fix remaining format violation * add unity test for read with unfiltered query * fix unfiltered query unit test count validation * use constant for num rows assertion on unfiltered query unit test Co-authored-by: Yi Hu <huuyyi@gmail.com> * fix cassandra query bugfix description Co-authored-by: Yi Hu <huuyyi@gmail.com> Co-authored-by: Lucas Marques <lucas.pmarques@b2wdigital.com> Co-authored-by: Yi Hu <huuyyi@gmail.com> * Retry create database in InfluxDbIOIT (#24800) * Retry create database in InfluxDbIOIT * Rename InfluxDb performance test to post commit * This test is actually an integration test on direct runner * Revert #23680 relax job node * Update .test-infra/jenkins/README * BEAM-13261 added max connections setting * Support SqlTypes Date in AvroUtils (sync) * Revert "Shard python precommit (#24204)" (#24855) This reverts commit 598324a8847e8bd4ea00c9526777e9e6e51229af. * [Spark Dataset runner] Fix initialization of metrics accumulator on driver (fixes #24809) (#24810) * Improve error message (#24843) * improve error message * improve error message * Playground Example CI fix for 24807 (#24819) * remove deleted files from git diff output * fixing ci example checker * remove diff filtering * Bump github.com/tetratelabs/wazero from 1.0.0-pre.4 to 1.0.0-pre.7 in /sdks (#24858) * Bump github.com/tetratelabs/wazero in /sdks Bumps [github.com/tetratelabs/wazero](https://github.com/tetratelabs/wazero) from 1.0.0-pre.4 to 1.0.0-pre.7. - [Release notes](https://github.com/tetratelabs/wazero/releases) - [Commits](https://github.com/tetratelabs/wazero/compare/v1.0.0-pre.4...v1.0.0-pre.7) --- updated-dependencies: - dependency-name: github.com/tetratelabs/wazero dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Remove ctx param from wazero fns Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jack McCluskey <thejackmccluskey@gmail.com> * Fix beam_CleanUpGCPResources job (#24770) * Fail the job at the end * Update README.md * Set DELETE_BEFORE_DAY in accordance with release period (#24867) * adding parameters flag to support schema based transforms (#24864) * Fix SQLIO Performance Test (#24824) * Fix SQLIO Performance Test * Update jenkins README * Reduce redundant Get API call when creating bq temp dataset (#24831) * Reduce redundant Get API call when creating bq temp dataset * Move logging to get_or_create_dataset and set appropriate level * Disable sonarqube cron job (#24769) * Saving empty output and log precompiled objects for consistency (#24877) * save empty ppc objects * adding missing argument * Playground runners scala update (#24873) * adding maven repo for sbt * externalize grpc timeout and document variables * fix rat check * fixing typo * minor change to re-trigger whitespace check * typo fixed * [#24801] ValueState.read() should specify returning null since it will if unset (#24869) * [#24801] ValueState.read() should allow returning null Also clarify documentation. Fixes #24801 * Address spotbugs and @Nullable on ValueState in sdks/java/io/google-cloud-platform * Add links to colab/github repo in notebook (#24886) * Add links to colab/github repo in notebook * Align links * Fix 2 missed notebooks * Remove trailing space breaking website blocks (#24875) * Remove trailing space breaking website blocks * Convert remaining references * Bump google.golang.org/api from 0.105.0 to 0.106.0 in /sdks (#24891) Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.105.0 to 0.106.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.105.0...v0.106.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [#21391]Increase unit testing coverage in the exec package (#24772) * add more test in sdks/go/pkg/beam/core/runtime/exec/translate_test.go * add test for translate.go unmarshalPort() * add test on function TestNewUserStateAdapter * add test for translate.go newBuilder(desc *fnpb.ProcessBundleDescriptor) * add test for translate.go UnmarshalPlan(desc *fnpb.ProcessBundleDescriptor) * better naming and error message * Update sdks/go/pkg/beam/core/runtime/exec/translate_test.go Co-authored-by: Ritesh Ghorse <riteshghorse@gmail.com> * use errors.New() when format string is not used Co-authored-by: v-zhenglinli <v-zhenglinli@microsoft.com> Co-authored-by: Ritesh Ghorse <riteshghorse@gmail.com> * Load categories from BEAM_EXAMPLE_CATEGORIES in checker.py (#24900) * Reduce lock contention for metrics hashName hasher (#24881) * Add metrics parallel bench Utilize parallel benchmark for metrics to measure scalability on multiple cores. * Reduce locking for hashName Use a pool of hashers rather than a singleton. For workloads that create Counters on the fly this results in a 4x or so overall speedup in adding incrementing Counters. * Bump golang.org/x/text from 0.5.0 to 0.6.0 in /sdks (#24889) Bumps [golang.org/x/text](https://github.com/golang/text) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/golang/text/releases) - [Commits](https://github.com/golang/text/compare/v0.5.0...v0.6.0) --- updated-dependencies: - dependency-name: golang.org/x/text dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump golang.org/x/sys from 0.3.0 to 0.4.0 in /sdks (#24890) Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.3.0 to 0.4.0. - [Release notes](https://github.com/golang/sys/releases) - [Commits](https://github.com/golang/sys/compare/v0.3.0...v0.4.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump github.com/aws/aws-sdk-go from 1.30.19 to 1.33.0 in /sdks (#24896) Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.30.19 to 1.33.0. - [Release notes](https://github.com/aws/aws-sdk-go/releases) - [Changelog](https://github.com/aws/aws-sdk-go/blob/v1.33.0/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go/compare/v1.30.19...v1.33.0) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump golang.org/x/net from 0.4.0 to 0.5.0 in /sdks (#24892) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.4.0 to 0.5.0. - [Release notes](https://github.com/golang/net/releases) - [Commits](https://github.com/golang/net/compare/v0.4.0...v0.5.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Print logical identifier in FieldType.toString (#24888) * Fix PR number in 2.43.0 changelog (#24904) * Update CHANGES.md * Update beam-2.43.0.md * Fix poorly formatted metadata (#24905) This was caused by a bad edit in https://github.com/apache/beam/commit/12f107249b7617d84c0552df4977de4f3801e557# * Add static override for JDK TLS disabled/legacy algorithms in Java container (#24619) * Adding jdk.tls security property options to java containers * Update CHANGES.md remove whitespace line endings * Fix invalid copy task configuration * Adding license to TLS properties files * Added bugfix description and link to CHANGES.md for #24623 * Renaming to match global implications for security properties override as TLS configs are not the only properties that can be changed * Add license * Add license * Adding TLS-enabled check to SdkHarnessEnvironment tests * Update sdks/java/container/java11/option-java11-security.json Co-authored-by: Kiley Sok <kileysok@gmail.com> * Adjusting version fallback specification for Java * Making suggested fixes * Fixing indentation * Adding SSLContext check to TLS availability test * Making suggested improvements to test * Removing exception imports no longer needed * Remove whitespace and erroneous context null initialization * Fix typo * Update sdks/java/core/src/test/java/org/apache/beam/sdk/SdkHarnessEnvironmentTest.java Co-authored-by: Lukasz Cwik <lcwik@google.com> * Fix spotless java precommit formatting error Co-authored-by: Kiley Sok <kileysok@gmail.com> Co-authored-by: Lukasz Cwik <lcwik@google.com> * Remove shuffle mode=appliance in tests (#24879) * Modify windmill DirectStreamObserver to call isReady only every 10 messages by default. This provides more output buffering which ensures that output is not throttled on synchronization when message sizes exceed 32KB grpc isready limit. * Sync script for grafana dashboard - github actions postcommit workflows (#24073) * Create job-nexmark-dataflow.yml * Update test-properties.json * Update job-nexmark-dataflow.yml * Update job-nexmark-dataflow.yml * Update test-properties.json * Update job-nexmark-dataflow.yml * Update test-properties.json * Update job-nexmark-dataflow.yml * Create job-nexmark-flink.yml * Create job-nexmark-spark.yml * Update job-nexmark-spark.yml * Update test-properties.json * Update job-nexmark-flink.yml * Update job-nexmark-spark.yml * Update job-nexmark-spark.yml * Create job-python-examples.yml * Update job-python-examples.yml * Create script to sync GA workflows with Grafana Dashboard * Sync using a cloud function * Tested with cloud function * Sync files * Delete new line * add comments * Add comments * Create Workflow Object, fix typos, modify database table, added exceptions, added multiple pages support, , fix style * Fix filename in database * Add retries, create a retrie function, delete comment, fix comment typo * Create table string now is using a for * Add comment related with the executions number * Update the number of workflows runs to 30 * Update number of executions to 100 * Integration test to load the default example of the default SDK and change the example (#24731) * Integration test to load the default example of the default SDK and change the example (#24730) (#24729) * Fix formatting and README (#24730) * Support collection v1.17.0 (#24730) * LoadingIndicator on chaning examples, remove duplicating licenses (#24730) * Add a missing license header (#24730) * Integration test for changing SDK and running code (#24779) (#382) * Integration test for changing SDK and running code (#24779) * Rename an integration test (#24779) * Use enum to switch SDK in integration test (#24779) * Find SDK in a dropdown by key (#24779) * Add a TODO (#24779) * Fix exports (#24779) * Issue24779 integration changing sdk from 24370 (#387) * Integration test for changing SDK and running code (#24779) * Rename an integration test (#24779) * Use enum to switch SDK in integration test (#24779) * Find SDK in a dropdown by key (#24779) * Add a TODO (#24779) * Fix exports (#24779) * Integration tests miscellaneous UI (#383) * miscellaneous ui integration tests * reverted pubspec.lock * gradle tasks ordered alhpabetically * integration tests refactoring * clean code * integration tests miscellaneous ui fix pr * rename method * added layout adaptivity * A minor cleanup (#24779) Co-authored-by: Dmitry Repin <mr.malarg@gmail.com> Co-authored-by: Dmitry Repin <mr.malarg@gmail.com> * issue23923 emulated data indicator (#24708) * issue23939 emulated data indicator * issue23923 removed assets constants, other small fixes * minor changes * removed redundant args * removed more redundant datasets * reverted pubspec * reverted lock * Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#24917) Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.29.6 to 1.30.0. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.29.6...service/s3/v1.30.0) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/service/s3 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * issue23918 extract java symbols (#24588) * issue23918 extract java symbols * issue23918 java symbols mapper uses jackson for writing a file * issue23918 java symbols mapper uses jackson for writing a file * issue23918 jackson replaces on yamlbeans * issue23918 add java symbols to playground * rat * rat * rat * minor fix * newlines added * issue23918 extract symbols for scala + minor fixes * issue23918 add build task when build java * issue 23918 java.g.yaml output alphabetically * empty line * issue23918 refactoring * issue23918 fix comparator * issue23918 tests for java sdk * issue23918 deleted settings.gradle * fix * issue 23918 fix pr * new lines at the end * load symbols optimization * issue23918 reverted mocks * fix tests * issue_23918 removed wget and added classes compilation to test * issue23918 support java 8 Co-authored-by: alexeyinkin <leha@inkin.ru> * Attempt deserialize all non-standard portable logical types from proto (#24910) * Attempt deserialize all non-standard logical types from proto * Fixes portable and not-yet-standard logical type get deserialized to UnknownLogicalType * TFT Criteo benchmarks (#24382) * Add criteo Cloud ML Beam pipeline * Test for Criteo small dataset * Add tft criteo code * Add pytest marker and requirements file * Update GCS path for test data * Add shuffle param * add gradle task * Fix uuid * Fixup flag * Add requirements file * Update tests * Revert "add gradle task" This reverts commit 63236ca410d0d2ae075d17af24b3f7d963aa1a86. * Add gradle task for TFT tests * Fix up lint * Add skip to unittest * Exclude tests from pydoc generation * Update timeout * Addressing comments * Add phrase trigger * Fixup job * Fix up groovy file * Update .test-infra/jenkins/job_CloudMLBenchmarkTests_Python.groovy * Fixup groovy file * update Java version from 11 to 17 (#24839) * [#21384] Add unit tests to the sql package (#24811) * [#24927] Ensure the SDK harness handles JCL/log4j/log4j2 messages. (#24928) * [#24927] Ensure the SDK harness handles JCL/log4j/log4j2 messages. This routes JCL/log4j/log4j2 messages to SLF4 which is then routed to JUL and finally over the Fn Logging API to the runner. Fixes #24927 * Fix up overrides * Improve Error Prone configuration Pass the required `--add-exports=` flags when running on JDK 17, even if `java17Home` is unset. * Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#24915) Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.11.46 to 1.11.47. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/feature/s3/manager/v1.11.46...feature/s3/manager/v1.11.47) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/manager dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update comment * update Beam GCP BOM (#24929) * update BOM * remove hard-coded pubsub lite version * [#24931][Go SDK] Make element checkpoints independant (#24932) * Update Kafka watermark based on policy when records poll empty (#24205) * Update watermark based on policy when records poll empty, to ensure watermark can advance even on empty partitions * add integration test to validate watermark advancement * filter out non-sdf kafka versions for testint * Clean up kafka build, rework kafka test case * apply spotles * shrink wait duration * update wait duration * Grafana dashboard - Github actions postcommit workflows mapping (#24176) * grafana dashboard postcommit GA and workflow to sync jobs * grafana dashboard postcommit to map GA workflows * update GA state in dashboard * apply changes into grafana dashboard required in script * Shard Python PreCommit (#24866) * WIP: Support passing in special pytest args * Shard precommit * Syntax + easier testing * Fix syntax * Skip -> ignore * Fix loop * Fix loop * Remove local testing mechanism and setup to interact with Jenkins * Strip off any quotes * Strip off any quotes * Strip off any quotes * Strip off any quotes * Strip off any quotes * Simplify arg parsing * Improve comment * Fail on no tests run * reapply single executor to storage write API writes (#24950) * [#21368] Clean-up and use the FixedExecutorProvider (#24952) This is a minor clean-up for https://github.com/apache/beam/pull/24950 to re-use existing implementation from gax-java * Add doLast on Playground integration tests (#24954) (#24955) * JDBC SchemaTransform implementation. Need to break out into a separat… (#24918) * JDBC SchemaTransform implementation. Need to break out into a separate PR to submit. * Update configuration to accommodate SDKs that might send empty strings or zeros in place of nulls. * Move write options validation test to correct class. Update classname and driver check to use Strings.isNullOrEmpty * Add a little documentation blurb to the providers * Fix periods that Github complains about but locally doesn't throw an error. * fix pinning bug in storage-api writes * [WebSite] Add new Go quickstart (#24885) Co-authored-by: Veronica Wasson <VeronicaWasson@users.noreply.github.com> * Add missing dependencies to maven archetype * Bump google.golang.org/grpc from 1.51.0 to 1.52.0 in /sdks (#24969) Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.51.0 to 1.52.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.51.0...v1.52.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump github.com/containerd/containerd from 1.6.8 to 1.6.12 in /sdks (#24945) Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.6.8 to 1.6.12. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.6.8...v1.6.12) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Playground] Add an option for Datastore namespace value for ci_cd.py script (#24818) * Add option to pass an arbitrary Datastore namespace value into ci_cd.py script * Pass Google Cloud Datastore project value into ci_cd.py script via an argument instead of using an environment variable directly * Update README.md to reflect changes to ci_cd.py script arguments * Multifile examples on frontend (#24859) (#24865) * Multifile examples on frontend (#24859) * Make a wrapper local (#24859) * Revert sdk.g.dart deletion, minor fixes (#24859) * Fix after review (#24859) * Return only Playground examples in GetCatalog() (#24816) * [Website] add credit karma case-study card (#24970) * [Website] add credit karma case-study card * Case study updates * Case study updates * add link Co-authored-by: Alex Kosolapov <alex.kosolapov@gmail.com> * [Playground] Fix for failing CI/CD in playground (#24984) * Fix test running for CI/CD scripts * Require --datastore-project in ci_Cd.py only for CD step * Improve error message for unencodeable types (#24844) * Add single job to run python coverage (#24988) * Add single job to run python coverage * Fix nameBase * Bump cloud.google.com/go/bigquery from 1.44.0 to 1.45.0 in /sdks (#24913) Bumps [cloud.google.com/go/bigquery](https://github.com/googleapis/google-cloud-go) from 1.44.0 to 1.45.0. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](https://github.com/googleapis/google-cloud-go/compare/bigquery/v1.44.0...bigquery/v1.45.0) --- updated-dependencies: - dependency-name: cloud.google.com/go/bigquery dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update beam-master version * Bump google.golang.org/api from 0.106.0 to 0.107.0 in /sdks (#24989) Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.106.0 to 0.107.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.106.0...v0.107.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [ToB] [Frontend] Authentication, unit progress, complete unit, SDK selection (#24457) * Content tree navigation (#23593) Unit content navigation (#23593) Update URL on node click (#23593) Active unit color (#23593) removeListener in unit (#23593) First unit is opened on group title click (#23593) WIP by Alexey Inkin (#23593) selectedUnitColor (#23593) Unit borderRadius (#23593) RegExp todo (#23593) added referenced collection package to remove warning (#23593) small refinement (#23593) expand on group tap, padding, openNode (#23593) group expansion bug fix (#23593) selected & unselected progress indicators (#23593) * AnimatedBuilders instead of StatefulWidgets in unit & group (#23593) * fixed _getNodeAncestors (#23593) * get sdkId (#23593) * addressing comments (#23593) * sdkId getter & StatelessExpansionTile (#23593) * expand & collapse group (#23593) * StatelessExpansionTile (#23593) * license (#23593) * ValueChanged and ValueKey in StatelessExpansionTile (#23593) * optional SDK selector in tour scaffold * AppNotifier * moved AppNotifier into _initializeState * StorageKeys * remove listener * auth, complete unit, user progress AuthNotifier draft (#23692) Comments (#23692) Comments (#23692)(1) sign in with google works (#23692) new configs (#23692) get user progress draft (#23692) comment fixes (#23692) sign in in IntroTextBody (#23692) reverted config (#23692) comment fixes (#23692) WIP before rebase (merge) (#23692) Squashed commit of the following: commit bff4919ff00ec3b5d7186efde41c884dfc4c8344 Merge: 79ba69483a ce8d618c77 Author: Alexey Romanenko <33895511+aromanenko-dev@users.noreply.github.com> Date: Thu Nov 17 10:34:02 2022 +0100 Merge pull request #24186: Uses _all to follow alias/datastreams when estimating index size commit 79ba69483a84ea0278d0b0ddb141200739607c77 Merge: 245fea9040 b7e860a762 Author: Chamikara Jayalath <chamikaramj@gmail.com> Date: Wed Nov 16 20:47:40 2022 -0800 Merge pull request #24218: Update Python wheel format for RC validation commit 245fea904014cd58d4148807463dbaa40000774c Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Nov 16 18:12:33 2022 -0800 Bump loader-utils from 1.4.1 to 1.4.2 in /sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel (#24191) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit e1de8e78deeb5d17617fda6591429eaaf8abb8a2 Author: Yi Hu <yathu@google.com> Date: Wed Nov 16 20:48:06 2022 -0500 Fix PythonLint (#24219) commit b7e860a7621771c300dcec625655f87e62591323 Author: Chamikara Jayalath <chamikara@apache.org> Date: Wed Nov 16 17:28:31 2022 -0800 updates commit c2feb09ea49dd815b69c65e531ce34128756d988 Author: Chamikara Jayalath <chamikara@apache.org> Date: Wed Nov 16 17:06:08 2022 -0800 updates commit ce8d618c77d23e20a1ddb128bb8183048597d096 Author: egalpin <egalpin@users.noreply.github.com> Date: Wed Nov 16 16:43:57 2022 -0800 Adds test for following aliases when estimating index size commit 959719d01c627328c0ca2849d2b7e2c9b322d4d1 Author: Chamikara Jayalath <chamikara@apache.org> Date: Wed Nov 16 15:16:06 2022 -0800 Temporary update Python RC validation job commit b952b41788acc20edbe5b75b2196f30dbf8fdeb0 Author: Yi Hu <yathu@google.com> Date: Wed Nov 16 14:18:12 2022 -0500 Python TextIO Performance Test (#23951) * Python TextIO Performance Test * Add filebasedio_perf_test module for unified test framework for Python file-based IOs * Fix MetricsReader publishes metrics duplicately if more than one load test declared. This is because MetricsReader.publishers was static class variable * Fix pylint * Distribute Python performance tests random time at a day instead of all at 3PM * Add information about length conversion commit 017f2cbde124af40a43be99ec88289fcf63c1c95 Merge: fef8acdbc0 88dba4f494 Author: Chamikara Jayalath <chamikaramj@gmail.com> Date: Wed Nov 16 10:39:52 2022 -0800 Merge pull request #24187: Add a reference to Java RunInference example commit fef8acdbc0ecbcc85b49144adaf8830e3bc6b2de Merge: 6e9187e67e ead245539d Author: Ahmet Altay <aaltay@gmail.com> Date: Wed Nov 16 10:24:53 2022 -0800 Merge pull request #24199 from Laksh47/issue#24196 refs: issue-24196, fix broken hyperlink commit 6e9187e67e1bd8f73997f437f0ed4c29880ed73b Author: Darkhan Nausharipov <31556582+nausharipov@users.noreply.github.com> Date: Wed Nov 16 22:33:50 2022 +0600 [Tour of Beam] [Frontend] Content tree URLs (#23776) * Content tree navigation (#23593) Unit content navigation (#23593) Update URL on node click (#23593) Active unit color (#23593) removeListener in unit (#23593) First unit is opened on group title click (#23593) WIP by Alexey Inkin (#23593) selectedUnitColor (#23593) Unit borderRadius (#23593) RegExp todo (#23593) added referenced collection package to remove warning (#23593) small refinement (#23593) expand on group tap, padding, openNode (#23593) group expansion bug fix (#23593) selected & unselected progress indicators (#23593) * AnimatedBuilders instead of StatefulWidgets in unit & group (#23593) * fixed _getNodeAncestors (#23593) * get sdkId (#23593) * addressing comments (#23593) * sdkId getter & StatelessExpansionTile (#23593) * expand & collapse group (#23593) * StatelessExpansionTile (#23593) * license (#23593) * ValueChanged and ValueKey in StatelessExpansionTile (#23593) Co-authored-by: darkhan.nausharipov <darkhan.nausharipov@kzn.akvelon.com> Co-authored-by: Alexey Inkin <alexey.inkin@akvelon.com> commit b33fac2aa533d77cfa47f88466c8cd6bd3f3e864 Author: Bruno Volpato <bvolpato@google.com> Date: Wed Nov 16 10:51:11 2022 -0500 Use only ValueProviders in SpannerConfig (#24156) commit 5f013ab6567ec75b460b2081d7f89d332320caff Author: Robert Burke <lostluck@users.noreply.github.com> Date: Wed Nov 16 07:23:10 2022 -0800 revert upgrade to go 1.19 for action unit tests (#24189) commit 9337f4dbecc929886f8559949a082a649fd9d1bb Author: Yi Hu <yathu@google.com> Date: Wed Nov 16 10:18:42 2022 -0500 Fix Python PostCommit Example CustomPTransformIT on portable (#24159) * Fix Python PostCommit Examples on portable * Fix custom_ptransform pipeline options gets modified * Specify flinkConfDir commit ead245539d01dec0f3e08699c1e1cc6777a5ef0e Author: Laksh <lakshmanansathya@gmail.com> Date: Wed Nov 16 09:32:46 2022 -0500 refs: issue-24196, fix broken hyperlink commit e83a996d4374d467d95bcfad7166905622ec615c Merge: 2fc56ec663 ffdee0b6ed Author: Jan Lukavský <je.ik@seznam.cz> Date: Wed Nov 16 15:15:31 2022 +0100 Merge pull request #24192: Re-use serializable pipeline options when already available. commit ffdee0b6edb8638c78a65ec85c727ea5dde1cb2f Author: Jozef Vilcek <jvilcek@zetaglobal.com> Date: Mon Nov 14 16:48:18 2022 +0100 Re-use serializable pipeline options when already available (#24192) commit 88dba4f494829b2b3530b767fb8c5252e0d2ba44 Author: Chamikara Jayalath <chamikaramj@gmail.com> Date: Tue Nov 15 16:21:22 2022 -0800 Add a reference to Java RunInference example commit 2fc56ec663e335cfcf37dc57d471f79b601414f4 Merge: f763186987 83f1bc19b9 Author: Kenn Knowles <kenn@apache.org> Date: Tue Nov 15 16:16:47 2022 -0800 Merge pull request #24142: Fix arguments to checkState in BatchViewOverrides commit f763186987c00ba1d26efdc35406436a1fa69a9a Merge: c2bc2135e9 0d7ca04182 Author: Ning Kang <ningkang0957@gmail.com> Date: Tue Nov 15 15:25:20 2022 -0800 Addresses #24161 Updated README of Interactive Beam commit c2bc2135e9bce715990a5d5551e2bc2dc0311da4 Author: Doug Judd <nuggetwheat@gmail.com> Date: Tue Nov 15 14:48:26 2022 -0800 Strip FGAC database role from changestreams metadata requests (#24177) Co-authored-by: Doug Judd <nuggetwheat@google.com> commit af637974f96ad1b5110d7dea3f9a26c68e19a51b Author: Jack McCluskey <34928439+jrmccluskey@users.noreply.github.com> Date: Tue Nov 15 17:16:43 2022 -0500 Add custom inference function support to the PyTorch model handler (#24062) * Initial type def and function signature * [Draft] Add custom inference fn support to Pytorch Model Handler * Formatting * Split out default * Remove Keyed version for testing * Move device optimization * Make default available for import, add to test classes * Remove incorrect default from keyed test * Keyed impl * Fix device arg * custom inference test * formatting * Add helpers to define custom inference functions using model methods * Trailing whitespace * Unit tests * Fix incorrect getattr syntax * Type typo * Fix docstring * Fix keyed helper, add basic generate route * Modify generate() to be different than forward() * formatting * Remove extra generate() def commit a014637106970a0a0e9eb7944aa5caf79fa5fd37 Author: egalpin <egalpin@users.noreply.github.com> Date: Tue Nov 15 13:57:54 2022 -0800 Uses _all to follow alias/datastreams when estimating index size Fixes #24117 commit 0d7ca041823bc2b09f76f86fdfd1d0b9508c9c88 Author: Ning Kang <ningkang0957@gmail.com> Date: Tue Nov 15 13:57:27 2022 -0800 Minor update commit e8fc759d756f4a987e41d2b9da56b906a6cd7736 Author: Ning Kang <ningkang0957@gmail.com> Date: Tue Nov 15 13:52:18 2022 -0800 Updated README of Interactive Beam Removed deprecated cache_dir runner param in favor of the cache_root global option. commit 08d5f72e5f35d41f3e9fa9fe799caea6bed1b7a7 Author: Anand Inguva <34158215+AnandInguva@users.noreply.github.com> Date: Tue Nov 15 16:34:21 2022 -0500 [Python]Support pipe operator as Union (PEP -604) (#24106) Fixes https://github.com/apache/beam/issues/21972 commit 526e7a58b62682582c27173ab21ed8667ddab766 Author: Scott Strong <scott.strong87@gmail.com> Date: Tue Nov 15 16:26:45 2022 -0500 Using Teardown context instead of deprecated finalize (#24180) * Using Teardown context instead of deprecated finalize * making function public Co-authored-by: Scott Strong <scott.strong@wunderkind.co> commit fb4d1d4dea7b26ed538a9f6aca0ed41e8c300e37 Author: Danny McCormick <dannymccormick@google.com> Date: Tue Nov 15 16:25:22 2022 -0500 Fix broken json for notebook (#24183) commit f98db2008a97f4546d036ddf0dddfee8c87eb58a Author: Robert Burke <lostluck@users.noreply.github.com> Date: Tue Nov 15 12:49:23 2022 -0800 Update automation to use Go 1.19 (#24175) Co-authored-by: lostluck <13907733+lostluck@users.noreply.github.com> commit e5f58504eef1fdeebe0402cda8a2df259169c704 Author: Brian Hulette <bhulette@google.com> Date: Tue Nov 15 12:25:13 2022 -0800 Add error reporting for BatchConverter match failure (#24022) * add error reporting for BatchConverters * Test pytorch * Finish up torch tests * yapf * yapf * Remove else commit 3037747f66f0d71d65b6c65745b4f8942c22f05a Author: Danny McCormick <dannymccormick@google.com> Date: Tue Nov 15 14:13:04 2022 -0500 Fix broken notebook (#24179) commit b2b1c739ce37690923891934ee317f799db937a2 Author: MakarkinSAkvelon <67736809+MakarkinSAkvelon@users.noreply.github.com> Date: Tue Nov 15 21:53:06 2022 +0500 [Playground] Move Playground in GKE and Infrastructure change (#23928) * changes to updated master branch * Change workflow * ingress changes * Certificate was added * Updates for cloud build backend * Update main.tf * Create main.tf * Create variables.tf * Update variables.tf * Update main.tf * Update variables.tf * Update main.tf * Create output.tf * Update output.tf * Update output.tf * Update main.tf * Update build.gradle.kts * Update output.tf * Update main.tf * Update main.tf * Update main.tf * Update variables.tf * Update main.tf * Update variables.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update variables.tf * Update main.tf * Update variables.tf * Update main.tf * Update variables.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update main.tf * Update output.tf * Update main.tf * Update main.tf * Update output.tf * Create variables.tf * Update main.tf * Update main.tf * Delete playground/terraform/infrastructure/cluddns directory * Update main.tf * Update output.tf * Update output.tf * Update build.gradle.kts * Update build.gradle.kts * Update build.gradle.kts * Update build.gradle.kts * Update build.gradle.kts * Update build.gradle.kts * Update README.md * Update README.md * helm folder name was changed * Update README.md * Update build.gradle.kts * Update build.gradle.kts * Update build.gradle.kts * Updates to readme * Fix DNS name * HelmChart was changed * Some workflows were changed * Remove unused file * playground-examples return * add license information * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * remove "stg" folder * Update README.md * Remove unused files * DNS Removed * var name changed * remove DNSName from var file * 1 * Clear terraform * remove unused records * gradle check * grade last change * issue fix * fix * 1 * run * test * Index creation for Gradle * Add IndexCreation in gradle * Update README.md * Update README.md * Fix names for Frontend * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Redis fix * services fix * Update variables.tf * change order in gradle * Fix Config.g.dart file issue * Update README.md * Playground workflow update Co-authored-by: Sergey Makarkin <sergey.makarkin@akvelon.com> Co-authored-by: Sergey Makarkin <sergey_makarkin@quicktest1.c.apache-beam-testing.internal> Co-authored-by: ruslan-ikhsan <ruslan.ikhsanov@akvelon.com> Co-authored-by: Alex Kosolapov <alex.kosolapov@gmail.com> commit 85df5f2eb2f299e28b36be0cce7b9c19d62124da Author: Yi Hu <yathu@google.com> Date: Tue Nov 15 11:38:13 2022 -0500 Eliminate CalciteUtil.CharType logical type (#24013) * Eliminate CalciteUtils.CharType logical type * Replace CalciteUtils.CharType to String Note that CalciteUtils still omits the precision of BINARY/VARBINARY/CHAR/VARCHAR as what it originally did. Support of the precision of these calcite types involves make use of making use of the overload method RelDataTypeFactory.createSqlType(var1, var2). * Replace every reference of CalciteUtil.CharType to generic PassThroughLogicalType check * Add TODO to Support sql types with arguments * Use VariableString in LogicalTypeTestCase commit f349f41010c5b238ff6020f7de718f938eef3c5e Author: alexeyinkin <alexey.inkin@akvelon.com> Date: Tue Nov 15 20:04:01 2022 +0400 Configure flutter_code_editor options with Hugo shortcode (#23926) (#24031) * Configure flutter_code_editor options with Hugo shortcode (#23926) * Minor fixes (#23926) * Refactor after review (#23926) commit 0f4ca6363b3ce0e5de3ad36517bb406aa6391a18 Author: Rebecca Szper <98840847+rszper@users.noreply.github.com> Date: Tue Nov 15 06:10:13 2022 -0800 Editorial review of the ML notebooks. (#24125) * Editorial review of the ML notebooks. * Editorial review of the ML notebooks. * Editorial review of the ML notebooks. * Update examples/notebooks/beam-ml/custom_remote_inference.ipynb Co-authored-by: Danny McCormick <dannymccormick@google.com> * Updating based on feedback * Update examples/notebooks/beam-ml/run_inference_sklearn.ipynb Co-authored-by: Danny McCormick <dannymccormick@google.com> * Update examples/notebooks/beam-ml/run_inference_tensorflow.ipynb Co-authored-by: Danny McCormick <dannymccormick@google.com> * Update examples/notebooks/beam-ml/run_inference_tensorflow.ipynb Co-authored-by: Danny McCormick <dannymccormick@google.com> * Update examples/notebooks/beam-ml/run_inference_tensorflow.ipynb Co-authored-by: Danny McCormick <dannymccormick@google.com> * Updating based on feedback Co-authored-by: Danny McCormick <dannymccormick@google.com> commit 5bd34ede026253326ebff1a7e4f9edb5f71b4a2c Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue Nov 15 07:17:28 2022 -0500 Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#24131) Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/manager](https://github.com/aws/aws-sdk-go-v2) from 1.3.2 to 1.11.39. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.3.2...feature/s3/manager/v1.11.39) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/manager dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 2ee809fa0ca7689dd0279e186ebc02d9569a8429 Merge: e3b9bdb2e6 563c66d6fd Author: Alexey Romanenko <33895511+aromanenko-dev@users.noreply.github.com> Date: Tue Nov 15 11:01:14 2022 +0100 Merge pull request #23065: [Website] Update copy icon styles commit e3b9bdb2e607d85a4017ba7839000e92a0ad83c4 Author: Moritz Mack <mmack@talend.com> Date: Tue Nov 15 10:40:50 2022 +0100 [Dockerized Jenkins] Fix build of dockerized jenkins (fixes #24053) (#24054) commit faaac2ab6e010374cb2be0e95a5dd345836a2a2c Author: Moritz Mack <mmack@talend.com> Date: Tue Nov 15 10:38:59 2022 +0100 [Dockerized Jenkins] Update README how to use local repo (#24055) commit 689e70b5131620540faf52e2f1e2dca7a36f269d Author: Damon <damondouglas@users.noreply.github.com> Date: Mon Nov 14 17:34:29 2022 -0800 Implement embedded WebAssembly example (#24081) commit e1bf6c42950e8013f35e35fb9fee8017e01e5010 Merge: eddac84126 10337d2868 Author: Robert Bradshaw <robertwb@gmail.com> Date: Mon Nov 14 15:22:14 2022 -0800 Merge pull request #24160 Rename the test_splits flag to direct_test_splits. commit eddac841261228a2c63fa9b225c520ae0f853806 Author: Pablo <pabloem@users.noreply.github.com> Date: Mon Nov 14 15:05:05 2022 -0800 More dataset templates to clean up (#24162) commit 2adb68bd12743566cc89b596bf204d7c807eb62d Author: Pablo <pabloem@users.noreply.github.com> Date: Mon Nov 14 13:28:13 2022 -0800 Adding a quickstart to README for the TS SDK (#23509) * More of a quickstart for the TS SDK * Update sdks/typescript/README.md Co-authored-by: Danny McCormick <dannymccormick@google.com> * Update sdks/typescript/README.md Co-authored-by: Danny McCormick <dannymccormick@google.com> Co-authored-by: Danny McCormick <dannymccormick@google.com> commit 10337d28685ad5712e2ad8608977ec5c5e0e6b6b Author: Robert Bradshaw <robertwb@gmail.com> Date: Mon Nov 14 12:46:32 2022 -0800 Rename the test_splits flag to direct_test_splits. This avoids possible flag conflicts. commit 48c70cc30742b45b17a1d18ece2f0d079bee3915 Author: arne-alex <108519096+arne-alex@users.noreply.github.com> Date: Mon Nov 14 21:33:02 2022 +0100 Merge pull request #23333: Track time on Cloud Dataflow streaming data reads and export via heartbeats commit 9c83de646ab52bd0b05e3346190dd55cd68b2a8b Author: Johanna Öjeling <51084516+johannaojeling@users.noreply.github.com> Date: Mon Nov 14 21:19:44 2022 +0100 Add more tests for S3 filesystem (#24138) commit 9e9c6d797ba52b460f83131431c8e53aebbbc9ac Merge: d5d76b9745 c600444e1d Author: Ning Kang <ningkang0957@gmail.com> Date: Mon Nov 14 12:06:15 2022 -0800 Merge pull request #24029 from apache/dependabot/npm_and_yarn/sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/loader-utils-1.4.1 Bump loader-utils from 1.4.0 to 1.4.1 in /sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel commit d5d76b974592d45de368ab641647ca5cc4ec12ec Author: Yi Hu <yathu@google.com> Date: Mon Nov 14 15:03:28 2022 -0500 Support SqlTypes Date and Timestamp (MicrosInstant) in AvroUtils (#23969) * Support SqlTypes Date and Timestamp (MicrosInstant) in AvroUtils * Add TODO about java.time migration commit 330cc2010c9f4a2d4e30318bf50a4109ec1cd392 Author: Pablo <pabloem@users.noreply.github.com> Date: Mon Nov 14 12:02:10 2022 -0800 Cleanup stale BQ datasets (#24158) * Cleanup stale BQ datasets * addressing comments commit 4a044999b8ed4bcd41f816f3a23ccb5da00c4c38 Merge: e563b9dd2f 5bd75c25de Author: Heejong Lee <heejong@gmail.com> Date: Mon Nov 14 11:16:00 2022 -0800 Merge pull request #24076 from chamikaramj/multilang_java_updates Updates Multi-lang Java quickstart commit e563b9dd2f3aa0484e6cdc08869991b5e438023e Author: Evgeny Antyshev <eantyshev@gmail.com> Date: Mon Nov 14 20:56:35 2022 +0300 [Tour Of Beam] verify that unit exists when saving progress (#24118) * AIO * Update learning/tour-of-beam/backend/integration_tests/auth_test.go Co-authored-by: Danny McCormick <dannymccormick@google.com> * nit Co-authored-by: Danny McCormick <dannymccormick@google.com> commit 774923e0dd089de870bfa5c77063ae2b28f79347 Merge: 71785de528 1ad0cbc445 Author: Kenn Knowles <kenn@apache.org> Date: Mon Nov 14 09:52:26 2022 -0800 Merge pull request #24141: Fix checkArgument format in GcsPath commit 71785de52864313c2e3b14fe72a2a63281343617 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Nov 14 11:54:37 2022 -0500 Bump github.com/aws/aws-sdk-go-v2/config from 1.17.10 to 1.18.0 in /sdks (#24151) Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.17.10 to 1.18.0. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.17.10...config/v1.18.0) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/config dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 50d591d6cb3e799bee4e29dfc593c693a86e6276 Author: Bruno Volpato <bvolpato@google.com> Date: Mon Nov 14 11:50:01 2022 -0500 Change DataflowBatchWorkerHarness doWork error level to INFO (#24135) commit 5a72696bfda09fdb905ba8e58b636f8494ef955f Merge: ee0a5836d6 0633fe9634 Author: Kenn Knowles <kenn@apache.org> Date: Mon Nov 14 08:12:12 2022 -0800 Merge pull request #24149: Remove extraneous jetbrains annotation commit ee0a5836d69b776834eb3bd9b2bd02eb5252c333 Merge: d001a69e1a 137799672e Author: Kenn Knowles <kenn@apache.org> Date: Mon Nov 14 08:11:00 2022 -0800 Merge pull request #24132: Fix checkArgument format string in AvroIO commit d001a69e1a58701d6ed4fcb5e3fb7a0921301dad Author: Yi Hu <yathu@google.com> Date: Mon Nov 14 10:56:54 2022 -0500 Test Dataproc 2.1 with Flink load tests (#24129) * Test Dataproc 2.1 with Flink load tests * Minor fix flink_cluster script commit caabd9be52887ad70c8a4269395c893811ac6a84 Author: Israel Herraiz <ihr@google.com> Date: Mon Nov 14 16:03:39 2022 +0100 Make MonotonicWatermarkEstimator work like its Java SDK equivalent (#24146) * Make MonotonicWatermarkEstimator work like its Java SDK equivalent The current implementation of MonotonicWatermarkEstimator raises an exception with late messages, which makes the watermark estimator barely usable in real world scenarios. This PR fixes #20041 by making this watermark estimator work like its Java SDK equivalent (`WatermarkEstimators.MonotonicallyIncreasing`). * Update unit tests too * Make linter happy commit 451f6b3e7f58d0a3782ad942c6a1fd9f63932024 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Nov 14 09:48:23 2022 -0500 Bump golang.org/x/net from 0.1.0 to 0.2.0 in /sdks (#24153) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.1.0 to 0.2.0. - [Release notes](https://github.com/golang/net/releases) - [Commits](https://github.com/golang/net/compare/v0.1.0...v0.2.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 2bb03d62e2d7dc2d8e39040fc9adebccbde74fde Merge: 4e39ef2041 623083cd0a Author: Alexey Romanenko <33895511+aromanenko-dev@users.noreply.github.com> Date: Mon Nov 14 15:01:13 2022 +0100 Merge pull request #24000: [Website] Change headers size from h4,h3 to h2 commit 563c66d6fd32165da14a07747f2764c17a5d24ea Author: bulat safiullin <bulat.safiullin@akvelon.com> Date: Wed Sep 7 18:28:42 2022 +0600 [Website] update pre tag copy link styles #23064 commit 4e39ef20410ee51c6040317bcd60171e64c5171f Merge: 223768f782 105ed6fedc Author: Alexey Romanenko <33895511+aromanenko-dev@users.noreply.github.com> Date: Mon Nov 14 10:55:33 2022 +0100 Merge pull request #24115: [Website] update go-dependencies.md java-dependencies.md links commit 223768f782f771f0033b8d0686d86cf4c71fad75 Merge: aa0a35dabf a9da2abee6 Author: Kenn Knowles <kenn@apache.org> Date: Sun Nov 13 18:53:13 2022 -0800 Merge pull request #24136: Fix checkArgument format string in ExecutionStateTracker commit 0633fe9634fe61df7cbc0ecac205d81124fd504a Author: Kenneth Knowles <klk@google.com> Date: Sat Nov 12 15:15:16 2022 -0800 Remove extraneous jetbrains annotation commit 83f1bc19b95935e60ca1f4027d4b60c7e738a84a Author: Kenneth Knowles <klk@google.com> Date: Sat Nov 12 14:16:09 2022 -0800 Fix arguments to checkState in BatchViewOverrides commit 1ad0cbc44594d8405bf4b07a126265238013a02a Author: Kenneth Knowles <klk@google.com> Date: Sat Nov 12 13:41:02 2022 -0800 Fix checkArgument format in GcsPath commit aa0a35dabf9c2a0d9822faff06d939d9a77a3ab6 Author: Kenn Knowles <kenn@apache.org> Date: Fri Nov 11 20:26:30 2022 -0800 Fix checkArgument format string in TestStream (#24134) commit a9da2abee6455bc2cf0f18ba5f6cd7bbaeae669f Author: Kenneth Knowles <klk@google.com> Date: Fri Nov 11 16:54:27 2022 -0800 Fix checkArgument format string in ExecutionStateTracker commit 369e2ba8622d3474c14c39b941b2c618842d1e47 Author: Ryan Thompson <ryanthompson591@gmail.com> Date: Fri Nov 11 19:46:07 2022 -0500 Add a ValidatesContainer integration test for use_sibling_sdk_workers (#24099) commit 137799672eb559a7586262e6a8a73d1ab3580e44 Author: Kenneth Knowles <klk@google.com> Date: Fri Nov 11 15:30:01 2022 -0800 Fix checkArgument format string in AvroIO commit 5d2dbf957e4e82fb3980726940df02ac67e563cd Author: Anand Inguva <34158215+AnandInguva@users.noreply.github.com> Date: Fri Nov 11 15:57:28 2022 -0500 Update staging of Python wheels (#24114) Fixes https://github.com/apache/beam/issues/24110 commit c2021bee1eba0322b43c90841397859048296b21 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri Nov 11 15:33:14 2022 -0500 Bump google.golang.org/api from 0.102.0 to 0.103.0 in /sdks (#24049) Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.102.0 to 0.103.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.102.0...v0.103.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 6557c91c79480b9d90573d52d257a11c2b160196 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri Nov 11 11:47:12 2022 -0800 Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#24112) Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.29.1 to 1.29.2. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.29.1...service/s3/v1.29.2) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/service/s3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 96f9da1ab652156cd143d57e3aa3d94836338f2b Aut…
Fixes #24575 and implements a MongoDB IO connector for the Go SDK with the following transforms:
mongodbio.Read
reads documents from a MongoDB collection into a PCollectionmongodbio.Write
writes elements from a PCollection to a MongoDB collection. If the struct element has a field with the bson tag "_id", the value of the field is used as the document id. If not, a new id of typeprimitive.ObjectID
is generatedThe Read transform follows a similar pattern as the Java and Python SDKs, where the MongoDB internal splitVector command is used by default, but it is possible to configure the transform to use a bucketAuto aggregation instead to support reads from MongoDB Atlas.
Pipelines using the Read and Write transforms have been run successfully with DirectRunner and DataflowRunner.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.