Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for S3A Connector #14312

Open
chrajeshbabu opened this issue Oct 26, 2024 · 4 comments
Open

Support for S3A Connector #14312

chrajeshbabu opened this issue Oct 26, 2024 · 4 comments

Comments

@chrajeshbabu
Copy link
Contributor

chrajeshbabu commented Oct 26, 2024

Currently conntroller and servers able to start with s3a path but while creating the segments during ingestion facing following error. The reason is while preparing file names we are prefixing the s3 scheme instead of s3a.

This will be useful to make use s3 compatible storages as a deep store.

Working on it.

Caused by: java.lang.IllegalStateException: Unable to extract out the relative path for input file 's3://testhadoop/pinot/batch/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro', based on base input path: s3a://testhadoop/pinot/batch/airlineStats/rawdata/
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:515) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
at org.apache.pinot.common.segment.generation.SegmentGenerationUtils.getRelativeOutputPath(SegmentGenerationUtils.java:162) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:278) ~[pinot-batch-ingestion-standalone-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?]
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:125)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
at picocli.CommandLine.execute(CommandLine.java:2174)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:173)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:204)
Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for file - s3://testhadoop/pinot/batch/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:287)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.IllegalStateException: Unable to extract out the relative path for input file 's3://testhadoop/pinot/batch/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro', based on base input path: s3a://testhadoop/pinot/batch/airlineStats/rawdata/
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:515)
at org.apache.pinot.common.segment.generation.SegmentGenerationUtils.getRelativeOutputPath(SegmentGenerationUtils.java:162)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:278)
... 5 more

@alguiguilo098
Copy link

alguiguilo098 commented Oct 27, 2024

@chrajeshbabu Hello! I’m currently studying Computer Science, and I’m very interested in contributing to open-source projects. If there are any tasks I could get started, please let me know. Thank you

@chrajeshbabu
Copy link
Contributor Author

Hi @alguiguilo098
@Jackie-Jiang @mayankshriv are the right people to guide and help to you to contribute some meaningful work to this community.
Thanks

@alguiguilo098
Copy link

@chrajeshbabu Thanks

@alguiguilo098
Copy link

@Jackie-Jiang @mayankshriv help me contribute to this community

chrajeshbabu added a commit to chrajeshbabu/pinot that referenced this issue Nov 16, 2024
xiangfu0 pushed a commit that referenced this issue Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants