Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][offload] Use filesystemURI as the storage path #23591

Merged

Conversation

zymap
Copy link
Member

@zymap zymap commented Nov 12, 2024


Motivation

Fixes #xyz

Main Issue: #xyz

PIP: #xyz

Motivation

We provided the fileSystemUri in the offload policy as the filesystem offload configuration. The fileSystemUri will overwrite the fs.defaultFS. We should use it as the storage path not the hadoop.tmp.dir as the storage path.

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

---

### Motivation

We provided the fileSystemUri in the offload policy as the
filesystem offload configuration. The fileSystemUri will
overwrite the fs.defaultFS. We should use it as the storage
path not the hadoop.tmp.dir as the storage path.
@zymap zymap added area/tieredstorage doc-required Your PR changes impact docs and you will update later. labels Nov 12, 2024
@zymap zymap added this to the 4.1.0 milestone Nov 12, 2024
@zymap zymap self-assigned this Nov 12, 2024
Copy link

@zymap Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot removed the doc-required Your PR changes impact docs and you will update later. label Nov 12, 2024
@github-actions github-actions bot added doc-required Your PR changes impact docs and you will update later. and removed doc-label-missing labels Nov 12, 2024
@zymap zymap force-pushed the yong/support-using-filesystemUri-for-hdfs branch from c5fd8f0 to 4b5b803 Compare November 13, 2024 11:21
Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zymap zymap merged commit b915f6e into apache:master Nov 18, 2024
49 of 52 checks passed
lhotari pushed a commit that referenced this pull request Nov 18, 2024
lhotari pushed a commit that referenced this pull request Nov 18, 2024
lhotari pushed a commit that referenced this pull request Nov 18, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 20, 2024
(cherry picked from commit b915f6e)
(cherry picked from commit c1cc2d6)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 21, 2024
(cherry picked from commit b915f6e)
(cherry picked from commit c1cc2d6)
@lhotari
Copy link
Member

lhotari commented Nov 22, 2024

This change is related to #23411. I'll revert the change in branch-3.0 since the TestFileSystemOffload test fails and blocks 3.0.8 release. branch-3.0 doesn't include #23411 .

lhotari added a commit that referenced this pull request Nov 22, 2024
)"

This reverts commit c1cc2d6.

This change is reverted in branch-3.0 since #23591 was needed due to #23411 which
is not included in branch-3.0.
@zymap
Copy link
Member Author

zymap commented Nov 27, 2024

@lhotari I don't understand why a configuration change will related to a dependency change. Shouldn't we check why the test is failed?

@zymap
Copy link
Member Author

zymap commented Nov 27, 2024

@lhotari I fixed the issue and pushed a commit to run the pulsar test here. The root cause is after this fix the fileSystemURI is working which makes the integration tests fail due to the permission issue. Because it used the root path. After changing it to the /pulsar/data, it runs successfully in my local.

@lhotari
Copy link
Member

lhotari commented Nov 27, 2024

@lhotari I don't understand why a configuration change will related to a dependency change. Shouldn't we check why the test is failed?

@zymap The difference in branch-3.0 is that #23411 isn't included. That's why the test failure in branch-3.0 was most likely related to that change. Something simply behaves differently.

@lhotari
Copy link
Member

lhotari commented Nov 27, 2024

@lhotari I fixed the issue and pushed a commit to run the pulsar test here. The root cause is after this fix the fileSystemURI is working which makes the integration tests fail due to the permission issue. Because it used the root path. After changing it to the /pulsar/data, it runs successfully in my local.

@zymap thanks! Wouldn't it be useful to submit similar test improvements to master branch?

@lhotari
Copy link
Member

lhotari commented Nov 27, 2024

@zymap The comment referenced the test failure, https://github.com/apache/pulsar/actions/runs/11977750391/job/33404574103#step:12:11625

  Error:  Tests run: 6, Failures: 3, Errors: 0, Skipped: 3, Time elapsed: 541.109 s <<< FAILURE! - in TestSuite
  Error:  org.apache.pulsar.tests.integration.offload.TestFileSystemOffload.testPublishOffloadAndConsumeDeletionLag[org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase$$Lambda$539/0x00007f269061e4b0@7e31d53b, org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase$$Lambda$540/0x00007f269061e6d8@68d8eb4f](4)  Time elapsed: 45.88 s  <<< FAILURE!
  java.lang.AssertionError: expected [true] but found [false]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertTrue(Assert.java:56)
  	at org.testng.Assert.assertTrue(Assert.java:66)
  	at org.apache.pulsar.tests.integration.offload.TestBaseOffload.writeAndWaitForOffload(TestBaseOffload.java:267)
  	at org.apache.pulsar.tests.integration.offload.TestBaseOffload.writeAndWaitForOffload(TestBaseOffload.java:224)
  	at org.apache.pulsar.tests.integration.offload.TestBaseOffload.testPublishOffloadAndConsumeDeletionLag(TestBaseOffload.java:307)
  	at org.apache.pulsar.tests.integration.offload.TestFileSystemOffload.testPublishOffloadAndConsumeDeletionLag(TestFileSystemOffload.java:43)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:840)
  
  Error:  org.apache.pulsar.tests.integration.offload.TestFileSystemOffload.testPublishOffloadAndConsumeViaCLI[org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase$$Lambda$539/0x00007f269061e4b0@18483b8b, org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase$$Lambda$540/0x00007f269061e6d8@24fc2c80](4)  Time elapsed: 20.055 s  <<< FAILURE!
  org.apache.pulsar.tests.integration.docker.ContainerExecException: /pulsar/bin/pulsar-admin topics offload-status -w persistent://offload-test-cli-egxb/ns1/topic1 failed on 565f687e87d19742184c52a01ec7db077dd5c4d044f3dcb06d57d19fe4185098 with error code 1
  	at org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete(DockerUtils.java:284)
  	at org.testcontainers.shaded.com.github.dockerjava.core.exec.AbstrAsyncDockerCmdExec$1.onComplete(AbstrAsyncDockerCmdExec.java:51)
  	at org.testcontainers.shaded.com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:276)
  	at java.base/java.lang.Thread.run(Thread.java:840)
  
  Error:  org.apache.pulsar.tests.integration.offload.TestFileSystemOffload.testPublishOffloadAndConsumeViaThreshold[org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase$$Lambda$539/0x00007f269061e4b0@20a47036, org.apache.pulsar.tests.integration.topologies.PulsarClusterTestBase$$Lambda$540/0x00007f269061e6d8@70c205bf](4)  Time elapsed: 43.208 s  <<< FAILURE!
  java.lang.AssertionError: expected [true] but found [false]
  	at org.testng.Assert.fail(Assert.java:110)
  	at org.testng.Assert.failNotEquals(Assert.java:1577)
  	at org.testng.Assert.assertTrue(Assert.java:56)
  	at org.testng.Assert.assertTrue(Assert.java:66)
  	at org.apache.pulsar.tests.integration.offload.TestBaseOffload.testPublishOffloadAndConsumeViaThreshold(TestBaseOffload.java:182)
  	at org.apache.pulsar.tests.integration.offload.TestFileSystemOffload.testPublishOffloadAndConsumeViaThreshold(TestFileSystemOffload.java:38)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:139)
  	at org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
  	at org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:840)
  
  [INFO] 
  [INFO] Results:
  [INFO] 
  Error:  Failures: 
  Error:    TestFileSystemOffload.testPublishOffloadAndConsumeDeletionLag:43->TestBaseOffload.testPublishOffloadAndConsumeDeletionLag:307->TestBaseOffload.writeAndWaitForOffload:224->TestBaseOffload.writeAndWaitForOffload:267 expected [true] but found [false]
  Error:    TestFileSystemOffload.testPublishOffloadAndConsumeViaCLI » ContainerExec /pulsar/bin/pulsar-admin topics offload-status -w persistent://offload-test-cli-egxb/ns1/topic1 failed on 565f687e87d19742184c52a01ec7db077dd5c4d044f3dcb06d57d19fe4185098 with error code 1
  Error:    TestFileSystemOffload.testPublishOffloadAndConsumeViaThreshold:38->TestBaseOffload.testPublishOffloadAndConsumeViaThreshold:182 expected [true] but found [false]

That problem got resolved by reverting. @zymap How did you fix that problem?

@lhotari
Copy link
Member

lhotari commented Nov 27, 2024

@zymap I'm curious to know why the test fails in branch-3.0 and not in master. I don't see other explanation to that other than #23411.

nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 28, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants