-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: support the updated and mvcc with parquet format #104407
Conversation
2854200
to
ac73fea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/ccl/changefeedccl/changefeedbase/options.go
line 446 at r3 (raw file):
// // TODO(#103129): add support for some of these var ParquetFormatUnsupportedOptions OptionsSet = makeStringSet(OptEndTime, OptDiff, OptKeyInValue, OptTopicInValue)
I realized topic in value is not explicitly supported or unsupported, so I made it explicitly unsupported for now.
aa1a486
to
1332113
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do not merge?
Should the release note be specified (for the second commit)?
Reviewed 2 of 8 files at r1, 7 of 8 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava and @shermanCRL)
pkg/ccl/changefeedccl/parquet.go
line 292 at r6 (raw file):
func init() { if buildutil.CrdbTestBuild { includeParquestTestMetadata = true
I would get rid of this local variable altogether... You can just inline buildutil.CrdbTestBuild...
Or... if you prefer, add a helper:
func includePartquetMetadataForTests() bool {
return buildutil.CrdbTestBuild
}
or just includeParquetMetadataForTests = buildutil.CrdbTestBuild
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the release note be specified (for the second commit)?
I spoke to @kathancox. It's easier just to write one release note at the end. I'll link the relevant PRs when I write that release note.
why do not merge?
A bit of a long story. If tests don't specify a format, defaulting to parquet is a bit confusing. Might as well use default for production changefeeds, which is JSON. Speaking from experience, it's confusing when tests do spooky things that differ to what happens in prod.
We also don't want to permanently alter the changefeed updated and mvcc timestamps tests to use parquet + cloud storage. We want to keep the randomized sinks. I thought about copying those two tests into parquet_test.go with parquet specified, but that's unnecessary code duplication IMO. We can continue to randomly use parquet in testing.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @shermanCRL)
pkg/ccl/changefeedccl/parquet.go
line 292 at r6 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
I would get rid of this local variable altogether... You can just inline buildutil.CrdbTestBuild...
Or... if you prefer, add a helper:func includePartquetMetadataForTests() bool { return buildutil.CrdbTestBuild } or just includeParquetMetadataForTests = buildutil.CrdbTestBuild
ACK. I'll make a change.
Parquet testing requires that extra metadata be written to parquet files so tests can create CDC rows from the raw data. Previously, the production parquet code relied on a testing knob to be passed to write this extra metadata. This is problematic as not all tests would pass the testing knob, making it so that we could not randomly use parquet in those tests for metamorphic testing purposes. With this change, the parquet production code uses `buildutil.CrdbTestBuild`, which is a global flag enabled in tests. Now, metamorphic parquet testing can be applied to more tests. Epic: None Release note: None
…rquet format Previously, using the parquet format in changefeeds did not support using the `mvcc` or `updated` options. This change adds support for using these options with parquet. Epic: None Release note: None
1332113
to
3045e41
Compare
bors r=miretskiy |
Build succeeded: |
changefeedccl: use buildutil.CrdbTestBuild for parquet testing
Parquet testing requires that extra metadata be written to parquet files
so tests can create CDC rows from the raw data. Previously, the production
parquet code relied on a testing knob to be passed to write this extra
metadata. This is problematic as not all tests would pass the testing knob,
making it so that we could not randomly use parquet in those tests for
metamorphic testing purposes. With this change, the parquet production code
uses
buildutil.CrdbTestBuild
, which is a global flag enabled in tests.Now, metamorphic parquet testing can be applied to more tests.
Epic: None
Release note: None
changefeedccl: support the updated and mvcc timestamp options with parquet format
Previously, using the parquet format in changefeeds did not support
using the
mvcc
orupdated
options. This change adds support for usingthese options with parquet.
Epic: None
Release note: None
do not merge: force parquet when possible during tests
This change will not be merged. When it is possible for tests
to use parquet, this change forces them to use parquet.
This change also forces tests for updated timestamps and mvcc
timestamps to use the cloudstorage sink + parquet format.
Release note: None
Epic: None