-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1732 Spline 0.6 integration #1739
1732 Spline 0.6 integration #1739
Conversation
#1110: Make Standardization and Conformance use Spline 0.4.x * Change Spline version * Make **Standardization** and **Confluence** depend and init the new Spline module instead of the old ones * Replacing Spline properties in templates * Using a different JSON library (instead of the one transitionally ingested from old Spline) * Application properties templates adjutsted * Helper scripts updated * README.md updated (addressing also changes in Menas and Oozie from other PR) * Missing copyright headers Co-authored-by: Georgi Chochov <g.chochov@gmail.com>
* Server-side api endpoints to provide new configuration * Server-side controller to serve the Spline WebJar * Client-side changes to gather the lineage id and display the lineage new UI * Added dependency to include the webjar * Header to allow iframe display `<frame-options policy="SAMEORIGIN"/>` * fixed `style.css`
* Rebasing to current `Develop`
* Unused import removed
…ncies * Version names sorted alphabetically
# Conflicts (resolved): # pom.xml # scripts/bash/run_enceladus.sh # spark-jobs/pom.xml
% Conflicts: % dao/pom.xml % data-model/pom.xml % examples/pom.xml % menas/pom.xml % menas/src/main/resources/application.properties.template % migrations-cli/pom.xml % migrations/pom.xml % plugins-api/pom.xml % plugins-builtin/pom.xml % pom.xml % spark-jobs/pom.xml % utils/pom.xml
…2-snapshot Aws poc update from develop 2.12 snapshot
- all directly-hdfs touching stuff disabled (atum, performance measurements, info files, output path checking) # Add menasfargate into hosts sudo nano /etc/hosts # paste 20.0.63.69 menasfargate # save & exit (ctrl+O, ctrl+X) # Running standardization works via: spark-submit --class za.co.absa.enceladus.standardization.StandardizationJob --conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=http://menasfargate:8080 -Dstandardized.hdfs.path=s3://euw1-ctodatadev-dev-bigdatarnd-s3-poc/enceladusPoc/ao-hdfs-data/stdOutput/standardized-{0}-{1}-{2}-{3}" ~/enceladusPoc/spark-jobs-2.11.0-SNAPSHOT.jar --menas-credentials-file ~/enceladusPoc/menas-credentials.properties --dataset-name dk_test1_emr285 --raw-format json --dataset-version 1 --report-date 2019-11-27 --report-version 1 2> ~/enceladusPoc/stderr.txt
…ut, s3 conf output) 0- all directly-hdfs touching stuff disabled (atum, performance measurements, info files, output path checking) # Add menasfargate into hosts sudo nano /etc/hosts # paste 20.0.63.69 menasfargate # save & exit (ctrl+O, ctrl+X) # Running conformance works via: spark-submit --class za.co.absa.enceladus.conformance.DynamicConformanceJob --conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=http://menasfargate:8080 -Dstandardized.hdfs.path=s3://euw1-ctodatadev-dev-bigdatarnd-s3-poc/enceladusPoc/ao-hdfs-data/stdOutput/standardized-{0}-{1}-{2}-{3}" ~/enceladusPoc/spark-jobs-2.11.0-SNAPSHOT.jar --menas-credentials-file ~/enceladusPoc/menas-credentials.properties --dataset-name dk_test1_emr285 --dataset-version 1 --report-date 2019-11-27 --report-version 1 2> ~/enceladusPoc/conf-log.txt
% Conflicts: % pom.xml
Feature/1416 S3 input/output for Enceladus on EMR (S2.4, H2.8.5)
This is a temporary solution. We currently experiment with many forms of URLs, and having a regex there now slows us down.
…32-spline-06-integration # Conflicts: # menas-web/Dockerfile # menas-web/nginx.conf # menas-web/start_menas_web.sh # menas-web/ui/index.html # menas-web/ui/package.json # menas-web/ui/service/RestDAO.js # menas-web/ui/service/RunRestDAO.js # menas-web/ui/ui5.yaml # menas/src/main/resources/start_menas.sh # menas/src/main/scala/za/co/absa/enceladus/menas/WebSecurityConfig.scala
Just curious why you included the Swagger in this PR? 😉 |
I considered it's quite useful to add it after separating Menas. If you think it's better, I will put the changes in a separate PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why you included the Swagger in this PR? 😉
I considered it's quite useful to add it after separating Menas. If you think it's better, I will put the changes in a separate PR
It is definitely useful, no doubt. But I see no crossover with the Spline integration. So having it in one PR - particularly when there already was a PR (maybe not on separated Menas) - seems contra productive, just increases the size of the PR.
Furthermore Spline is dragging behind a little. Swagger could be already merged.... Perhpas it would be worth to separate into its own PR. 🤔
README.md
Outdated
@@ -119,7 +119,7 @@ password=changeme | |||
--deploy-mode <client/cluster> \ | |||
--driver-cores <num> \ | |||
--driver-memory <num>G \ | |||
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_for_standardized_output>-{0}-{1}-{2}-{3} -Dspline.mongodb.url=<mongo_url_for_spline> -Dspline.mongodb.name=<spline_database_name> -Dhdp.version=<hadoop_version>" \ | |||
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_for_standardized_output>-{0}-{1}-{2}-{3} -Dspline.producer.url=<url_for_spline_consumer> -Dhdp.version=<hadoop_version>" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an effort to decouple Enceladus and Spline, I wouldn't even mention this option here. If in application.conf
it's good enough, and can be overridden as any other parameter.
@@ -28,7 +28,7 @@ import org.springframework.context.annotation.{Bean, Configuration} | |||
class HDFSConfig @Autowired() (spark: SparkSession) { | |||
private val logger = LoggerFactory.getLogger(this.getClass) | |||
|
|||
@Value("${menas.hadoop.auth.method}") | |||
@Value("${menas.hadoop.auth.method:}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the colon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to fail before because of this, but we solved it another way. Anyway, it seems to be the only one without a colon here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not forget what the colon means - in general, it changes the meaning of the property ${"my.property.key"}
(without default value - so failing to find it in the config should cause a failure) to ${"my.property.key:defaultValueForThatKey"}
(where failing to find the key in config just fallbacks to the default - and the nothing after the colon means default empty string, but having default provided nonetheless)
Most probably, you all knew all this, just stating it so we "just don't add" colons because it is present elsewhere, but we consider the true meaning - and here that it should be whether it is reasonable to provide default empty string when the key is not found in properties.
@@ -130,8 +126,11 @@ menas.oozie.mavenSparkJobsJarLocation=/za/co/absa/enceladus/spark-jobs/@project. | |||
#Menas URL for submitted std and conf jobs | |||
menas.oozie.menasApiURL=http://menasHostname:8080/menas/api | |||
|
|||
#Mongo address for spline for the submitted jobs | |||
menas.oozie.splineMongoURL=mongodb://localhost:27017 | |||
#The URL address of Spline for the submitted jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of it Spline reference from Oozie code completely. Let's take if from application.conf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought of this, but wouldn't it break Oozie integration?
menas/src/main/scala/za/co/absa/enceladus/menas/repositories/OozieRepository.scala
Outdated
Show resolved
Hide resolved
…on' into feature/1732-spline-06-integration
…32-spline-06-integration # Conflicts: # README.md # rest-api/src/main/scala/za/co/absa/enceladus/rest_api/WebSecurityConfig.scala
…32-spline-06-integration # Conflicts: # pom.xml # spark-jobs/pom.xml # spark-jobs/src/main/resources/spline.properties.template
…32-spline-06-integration
…on' into feature/1732-spline-06-integration # Conflicts: # rest-api/src/main/scala/za/co/absa/enceladus/rest_api/WebSecurityConfig.scala
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (read the code and test-ran on EKS/Fargate with @AdrianOlosutean )
@@ -28,7 +28,7 @@ import org.springframework.context.annotation.{Bean, Configuration} | |||
class HDFSConfig @Autowired() (spark: SparkSession) { | |||
private val logger = LoggerFactory.getLogger(this.getClass) | |||
|
|||
@Value("${menas.hadoop.auth.method}") | |||
@Value("${menas.hadoop.auth.method:}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the the empty string a desired default value of this field when all properties files in the project use menas.hadoop.auth.method=default
. If we need the default value here, shouldn't it rather be default
, then?
@Value("${menas.hadoop.auth.method:}") | |
@Value("${menas.hadoop.auth.method:default}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (read the code and test-ran on EKS/Fargate with @AdrianOlosutean )
Closes #1732