Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][io] Fix Alluxio sink to respect the alluxioMasterHost property #19172

Conversation

sekikn
Copy link
Contributor

@sekikn sekikn commented Jan 10, 2023

Motivation

Currently, the Alluxio sink always refers to localhost regardless of the alluxioMasterHost property.

Suppose we have a local Pulsar cluster (192.168.2.2) and a remote Alluxio cluster (192.168.33.10), as follows:

$ ip address show

...

2: enp1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether e0:4f:43:e8:4c:86 brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.2/24 brd 192.168.2.255 scope global noprefixroute enp1s0f1
       valid_lft forever preferred_lft forever

...

$ curl -sL http://192.168.33.10:19999/api/v1/master/info | jq .version
"2.7.3"

Build the Alluxio sink on the master branch and deploy it to the local Pulsar:

$ mvn clean install -DskipTests

...

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  14:00 min
[INFO] Finished at: 2023-01-10T16:49:55+09:00
[INFO] ------------------------------------------------------------------------
$ mkdir connectors
$ cp pulsar-io/alluxio/target/pulsar-io-alluxio-2.12.0-SNAPSHOT.nar connectors
$ bin/pulsar standalone

Enable the Alluxio sink and ingest some messages into the topic in question:

$ bin/pulsar-admin sink available-sinks
alluxio
Writes data into Alluxio
----------------------------------------
$ cat /tmp/alluxio-sink.yml
configs:
    alluxioMasterHost: "192.168.33.10"
    alluxioMasterPort: "19998"
    alluxioDir: "pulsar"
    filePrefix: "TopicA"
    fileExtension: ".txt"
    lineSeparator: "\n"
    rotationRecords: 10
    rotationInterval: "-1"
$ bin/pulsar-admin sinks create \
    --tenant public \
    --namespace default \
    --name alluxio-sink \
    --sink-type alluxio \
    --sink-config-file /tmp/alluxio-sink.yml \
    --inputs TopicA
Created successfully
$ for i in $(seq 0 9); do bin/pulsar-client produce -m "key-$i" -n 1 TopicA; done

...

$ cat logs/functions/public/default/alluxio-sink/alluxio-sink-0.log

...

2023-01-10T17:02:23,453+0900 [public/default/alluxio-sink-0] ERROR org.apache.pulsar.functions.instance.JavaInstanceRunnable - [public/default/alluxio-sink:0] Uncaught exception in Java Instance
alluxio.exception.status.UnavailableException: Failed to connect to master (192.168.2.2/<unresolved>:19998) after 44 attempts.Please check if Alluxio master is currently running on "192.168.2.2/<unresolved>:19998". Service="FileSystemMasterClient"
	at alluxio.AbstractClient.connect(AbstractClient.java:279) ~[?:?]
	at alluxio.client.file.BaseFileSystem.rpc(BaseFileSystem.java:572) ~[?:?]
	at alluxio.client.file.BaseFileSystem.exists(BaseFileSystem.java:202) ~[?:?]
	at alluxio.client.file.FileSystem.exists(FileSystem.java:275) ~[?:?]
	at org.apache.pulsar.io.alluxio.sink.AlluxioSink.open(AlluxioSink.java:108) ~[?:?]
	at org.apache.pulsar.functions.instance.JavaInstanceRunnable.setupOutput(JavaInstanceRunnable.java:935) ~[pulsar-functions-instance.jar:2.12.0-SNAPSHOT]
	at org.apache.pulsar.functions.instance.JavaInstanceRunnable.setup(JavaInstanceRunnable.java:251) ~[pulsar-functions-instance.jar:2.12.0-SNAPSHOT]
	at org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:290) ~[pulsar-functions-instance.jar:2.12.0-SNAPSHOT]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: alluxio.exception.status.UnavailableException: Failed to handshake with master 192.168.2.2/<unresolved>:19998 to load cluster default configuration values: UNAVAILABLE: io exception
	at alluxio.util.ConfigurationUtils.loadConfiguration(ConfigurationUtils.java:521) ~[?:?]
	at alluxio.ClientContext.loadConf(ClientContext.java:134) ~[?:?]
	at alluxio.ClientContext.loadConfIfNotLoaded(ClientContext.java:158) ~[?:?]
	at alluxio.AbstractClient.beforeConnect(AbstractClient.java:176) ~[?:?]
	at alluxio.AbstractClient.connect(AbstractClient.java:224) ~[?:?]
	... 8 more

As the error messages indicate, the Alluxio sink tried to access the local address (192.168.2.2), in spite of the value of alluxioMasterHost (192.168.33.10).

Modifications

The reason of this behavior is in the following line.
https://github.com/apache/pulsar/blob/master/pulsar-io/alluxio/src/main/java/org/apache/pulsar/io/alluxio/sink/AlluxioSink.java#L95

The value of alluxioMasterHost is set to the default configuration in that line, but the configuration object which is really used to create FileSystem is already instantiated before it, so it's ineffective.
https://github.com/apache/pulsar/blob/master/pulsar-io/alluxio/src/main/java/org/apache/pulsar/io/alluxio/sink/AlluxioSink.java#L100

This PR fixes this wrong implementation.

Verifying this change

  • Make sure that the change passes the CI checks.

I manually ensured that the ingested messages were output as an Alluxio file with this PR, as follows:

Before ingesting messages:

$ bin/alluxio fs ls /pulsar
(the directory is empty)

During ingestion:

$ bin/alluxio fs ls -R /pulsar
-rw-r--r--  sekikn         sekikn                       0   NOT_PERSISTED 01-10-2023 13:24:52:718 100% /pulsar/tmp/380f0218-3eff-497e-bd47-b4092879a0a9_tmp.txt
drwxr-xr-x  sekikn         sekikn                       1   NOT_PERSISTED 01-10-2023 13:24:52:718  DIR /pulsar/tmp

After ingestion:

$ bin/alluxio fs ls -R /pulsar
-rw-r--r--  sekikn         sekikn                      60   NOT_PERSISTED 01-10-2023 13:25:22:821 100% /pulsar/TopicA-1673357122834.txt
drwxr-xr-x  sekikn         sekikn                       0   NOT_PERSISTED 01-10-2023 13:25:22:839  DIR /pulsar/tmp
$ bin/alluxio fs cat /pulsar/TopicA-1673357122834.txt
key-0
key-1
key-2
key-3
key-4
key-5
key-6
key-7
key-8
key-9

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: sekikn#5

A few tests failed, but all of them seem to be unrelated to this fix.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

thanks

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 10, 2023
@nicoloboschi
Copy link
Contributor

/pulsarbot rerun-failure-checks

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! +1 to merge

@codecov-commenter
Copy link

codecov-commenter commented Jan 10, 2023

Codecov Report

Merging #19172 (dd43f24) into master (9ef54fd) will increase coverage by 0.17%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #19172      +/-   ##
============================================
+ Coverage     47.22%   47.40%   +0.17%     
+ Complexity    10713     9503    -1210     
============================================
  Files           713      633      -80     
  Lines         69697    59892    -9805     
  Branches       7485     6239    -1246     
============================================
- Hits          32914    28391    -4523     
+ Misses        33096    28383    -4713     
+ Partials       3687     3118     -569     
Flag Coverage Δ
unittests 47.40% <ø> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../pulsar/broker/service/SharedConsumerAssignor.java 3.70% <0.00%> (-74.08%) ⬇️
...ion/buffer/metadata/TransactionBufferSnapshot.java 42.85% <0.00%> (-42.86%) ⬇️
...java/org/apache/pulsar/proxy/stats/TopicStats.java 58.82% <0.00%> (-41.18%) ⬇️
...apache/pulsar/broker/service/EntryAndMetadata.java 0.00% <0.00%> (-40.75%) ⬇️
...saction/timeout/TransactionTimeoutTrackerImpl.java 50.87% <0.00%> (-36.85%) ⬇️
...ar/broker/loadbalance/impl/BundleSplitterTask.java 54.00% <0.00%> (-28.00%) ⬇️
...er/impl/SingleSnapshotAbortedTxnProcessorImpl.java 46.73% <0.00%> (-23.92%) ⬇️
...nsaction/pendingack/impl/PendingAckHandleImpl.java 36.54% <0.00%> (-14.62%) ⬇️
...ransaction/buffer/impl/TopicTransactionBuffer.java 44.18% <0.00%> (-14.54%) ⬇️
...ar/broker/transaction/util/LogIndexLagBackoff.java 42.85% <0.00%> (-14.29%) ⬇️
... and 127 more

@tisonkun
Copy link
Member

tisonkun commented Jan 10, 2023

Interesting. I'm trying to execute the test locally but always get:

023-01-11T00:09:01,298+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 1): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 7 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:03,440+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 2): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:05,714+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 3): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:08,194+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 4): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:11,134+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 5): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:14,835+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 6): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:20,190+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 7): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}
2023-01-11T00:09:27,748+0800 WARN  [WorkerThread-1291485735] a.r.RetryUtils@43 - Failed to create worker id (attempt 8): alluxio.exception.status.UnavailableException: Failed to connect to master (198.18.0.10/<unresolved>:19998) after 9 attempts.Please check if Alluxio master is currently running on "198.18.0.10/<unresolved>:19998". Service="BlockMasterWorker" {}

It seems Alluxio 2.7.3 doesn't support Apple M1 chip and thus the case.

@tisonkun
Copy link
Member

Merging...

@tisonkun tisonkun merged commit dda1437 into apache:master Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants