Skip to content

Conversation

@wangyinsheng
Copy link
Contributor

The metrics reporter will be lost when a table be sent to other nodes in a cluster through serializable table, This mr rebuild the metrics reporter when the table is deserialized

@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch 2 times, most recently from 882b279 to 6742851 Compare March 19, 2023 12:23
@wangyinsheng
Copy link
Contributor Author

PTAL @nastra

}

@Override
public MetricsReporter metricsReporter() {
Copy link
Contributor

@nastra nastra Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test that makes sure the metrics reporter is preserved with the right properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch from 6742851 to a08be15 Compare March 24, 2023 00:57
Assert.assertEquals("History must match", expected.history(), actual.history());
}

public static void assertSerializedMetricsReporter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add this here, this can be just part of the test method itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Assert.assertEquals(
"metrics reporter from serializableTable should equals the one from origin table",
expected,
actual);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than testing actual equality on the objects, it should be enough to make sure that the properties are the same, hence the TestMetricsReporter wouldn't need to implement hasCode/equals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import org.junit.After;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TemporaryFolder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look at https://iceberg.apache.org/contribute/#testing. Basically new test classes should be JUnit5 rather than JUnit4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

throw new CommitFailedException("Injected failure");
}
Integer version = VERSIONS.get(tableName);
String metadataLocation =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these changes necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method lazyTable in SerializableTable will load metadata file, while the TestTable has not write metadata file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think those changes are not necessary. The test passed fine for me without having those changes

public static void assertSerializedMetricsReporter(
MetricsReporter expected, MetricsReporter actual) {
Assert.assertNotNull("metrics reporter from serializableTable should not be null", actual);
Assert.assertTrue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified to Assertions.assertThat(actual).isNotNull().isInstanceOf(...) and for the properties: Assertions.assertThat(actual.properties).isEqualTo(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch from a08be15 to 69356e7 Compare March 24, 2023 13:56
public void testSerializableTableWithMetricsReporter()
throws IOException, ClassNotFoundException {
Map<String, String> properties = Maps.newHashMap();
properties.put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be simplified to Map<String, String> properties = ImmutableMap.of(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


private static final SortOrder SORT_ORDER = SortOrder.builderFor(SCHEMA).asc("id").build();

@TempDir public File temp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't have to be public to work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
default MetricsReporter metricsReporter() {
throw new UnsupportedOperationException(
"metricsReporter is not supported by " + getClass().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"metricsReporter is not supported by " + getClass().getName());
"Accessing metrics reporter is not supported");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

public static class TestMetricsReporter implements MetricsReporter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also move this to the respective test class? I don't think it's necessary to put this into TestHelpers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

properties.put(
CatalogProperties.METRICS_REPORTER_IMPL, TestHelpers.TestMetricsReporter.class.getName());
MetricsReporter reporter = CatalogUtil.loadMetricsReporter(properties);
Table table = TestTables.create(temp, "tbl_A", SCHEMA, SPEC, SORT_ORDER, 2, reporter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where we're testing that the reporter itself has the correct properties.

Shouldn't this do

Map<String, String> reporterProperties = ImmutableMap.of("a", "1", "b", "2");
reporter.initialize(reporterProperties);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reporter.initialize(reporterProperties) is called in the method CatalogUtil.loadMetricsReporter(properties)

if (lazyMetricsReporter == null) {
synchronized (this) {
if (lazyMetricsReporter == null) {
lazyMetricsReporter = CatalogUtil.loadMetricsReporter(this.metricsReporterProperties);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics reporter is configured by catalog properties, not table properties. I add the interface method MetricsReporter.properties() to return the properties which used to load and initialize this MetricsReporter, therefor the SerializableTable can use this properties to load and initialize MetricsReporter agagin

@Test
public void testSerializableTableWithMetricsReporter()
throws IOException, ClassNotFoundException {
Map<String, String> properties = Maps.newHashMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are the properties of the table, so you'd have to pass them to the table. The metrics reporter then has separate properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics reporter is configured by catalog properties, not table properties. Here this properties is taken as properties of catalog

@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch from 69356e7 to 9838834 Compare March 25, 2023 03:41
if (lazyMetricsReporter == null) {
synchronized (this) {
if (lazyMetricsReporter == null) {
lazyMetricsReporter = CatalogUtil.loadMetricsReporter(this.metricsReporterProperties);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics reporter is configured by catalog properties, not table properties. I add the interface method MetricsReporter.properties() to return the properties which used to load and initialize this MetricsReporter, therefor the SerializableTable can use this properties to load and initialize MetricsReporter agagin

@wangyinsheng
Copy link
Contributor Author

PTAL @nastra

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did take another look and I don't think we need the changes in TestTables, because the tests will pass without those.

@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch from 9838834 to 2ac204b Compare March 28, 2023 13:57
@wangyinsheng
Copy link
Contributor Author

I did take another look and I don't think we need the changes in TestTables, because the tests will pass without those.

you're right, it not necessary. I used to compare the metadata between origin table and serializable table, then it will be needed. I have removed the changes, PTAL @nastra

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @wangyinsheng.

@aokolnychyi could you also take a look please?

@wangyinsheng
Copy link
Contributor Author

PTAL @aokolnychyi

1 similar comment
@wangyinsheng
Copy link
Contributor Author

PTAL @aokolnychyi

@wangyinsheng
Copy link
Contributor Author

could you please take a look this pr, it has last for weeks @aokolnychyi

@wangyinsheng
Copy link
Contributor Author

@nastra Could you please approval the CI ? And is there someone else can review this pr?

@wangyinsheng
Copy link
Contributor Author

@aokolnychyi @danielcweeks could you take a look please ? thanks

@nastra
Copy link
Contributor

nastra commented Apr 12, 2023

@wangyinsheng to fix CI failures I think you need to add the below code to TransactionTable:

    @Override
    public MetricsReporter metricsReporter() {
      return BaseTransaction.this.reporter;
    }

and then also this to BaseMetadataTable:

  @Override
  public MetricsReporter metricsReporter() {
    return table().metricsReporter();
  }

@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch 2 times, most recently from 08afaff to 4ea3cc0 Compare April 12, 2023 14:41
@wangyinsheng wangyinsheng force-pushed the metrics_reporter_for_serializable_table branch from 4ea3cc0 to f831280 Compare April 12, 2023 15:03
this.metricsReporterProperties =
table instanceof BaseTable
? SerializableMap.copyOf(table.metricsReporter().properties())
: null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should rather do #7144 (comment) than setting this to null here.

Copy link
Contributor Author

@wangyinsheng wangyinsheng Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed the comment when I try to fix the CI failures, I wll change it later.

@nastra
Copy link
Contributor

nastra commented Apr 13, 2023

@wangyinsheng I talked to @rdblue yesterday and we concluded that we don't want to carry over the catalog properties just for the metrics reporter. The better approach would be to make MetricsReporter extend Serializable. I opened wangyinsheng#1 against your branch to do that. Feel free to squash it into your changes

@wangyinsheng
Copy link
Contributor Author

@nastra I am confused why should't carry over the catalog properties for metrics reporter? We already support to config metrics report through catalog properites. And metrics reporter may contains non-serializable object, KafakProducer for example. We need properties to recreate non-serializable object after table been deserialized

@wangyinsheng
Copy link
Contributor Author

please take a look @nastra @rdblue @aokolnychyi @danielcweeks

@wangyinsheng
Copy link
Contributor Author

please take a look @nastra @rdblue @aokolnychyi @danielcweeks

@nastra
Copy link
Contributor

nastra commented Apr 24, 2023

@wangyinsheng as mentioned previously, we still would like to make the MetricsReporter Serializable. For things that aren't Serializable in a metrics reporter, we would make them transient and then lazily initialize them

@rdblue
Copy link
Contributor

rdblue commented Jun 4, 2023

@nastra, where are we with the fix for serialization here?

@nastra
Copy link
Contributor

nastra commented Jun 6, 2023

@nastra, where are we with the fix for serialization here?

@rdblue I haven't heard back from @wangyinsheng after my comment above , so I've opened #7370 a while ago for initial experimentation and then updated it to fix the serialization issue.

@wangyinsheng
Copy link
Contributor Author

@nastra, where are we with the fix for serialization here?

@rdblue I haven't heard back from @wangyinsheng after my comment above , so I've opened #7370 a while ago for initial experimentation and then updated it to fix the serialization issue.

@nastra @rdblue Sorry,I am working on other things for a while. But I am still keep my opinion, I think the SerializableTable is the best place to handle the lazily initialization stuff for MetricsReporter, we should not just make MetricsReporter serializable, which will throw the lazily initialization stuff to any implemention class of MetricsReporter

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 28, 2024
@github-actions
Copy link

github-actions bot commented Sep 5, 2024

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants