-
-
Notifications
You must be signed in to change notification settings - Fork 6
chore(spark): update hadoop dependencies #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the bump, LGTM!
Any particular reason you removed the links such as https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/3.4.1
?
I find it complicated enough to find the correct versions by browsing Maven. Yes, the links are a maintenance burden, but I think they are very much worth it, as I don't know the decency structure by heart
What do you need the links for? Small update: the exact dependency versions must be obtained from the Hadoop image. |
Well... To determine the versions 😅 |
Ahh I see the different ways now. I use the dependency versions (as e.g. Maven does). You look in the file system of the docker image. I slightly prefer explicit versions as a double safety measure, as we messed up the AWS bundle version in the past. But back than we didn't copy it from the Hadoop image. |
yes, because that is the source location for the spark image and takes any stackable patches into account. |
WDYT of getting rid of all this variables than? |
I would prefer that too but it is not safe. The docker COPY directive will not fail if the source file doesn't exist. |
At least for Hive we don't use BUT I noticed the spark Dockerfile curls jackson-dataformat-xml, stax2-api and woodstox-core! I have seen to many mistakes of this kind, we should IMHO just stick to schema F and put the links in there. To easy to make it wrong otherwise. |
|
Good point! In Hive we copy from the hadoop builder to the hive builder. But doesn't matter right now. Future optimization 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links, only one small typo
Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de>
Description
See #1273 (comment)
Definition of Done Checklist
Note
Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant.
Please make sure all these things are done and tick the boxes
TIP: Running integration tests with a new product image
The image can be built and uploaded to the kind cluster with the following commands:
See the output of
boil
to retrieve the image manifest URI for<MANIFEST_URI>
.