Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1732 Spline 0.6 integration #1739

Merged
merged 118 commits into from
Jun 2, 2021

Conversation

AdrianOlosutean
Copy link
Contributor

@AdrianOlosutean AdrianOlosutean commented Mar 22, 2021

Closes #1732

GeorgiChochov and others added 30 commits March 19, 2020 23:02
* #1126: Spline 0.4.x in Oozie
* new setting for Spline used by Oozie
* changed Oozie template
* optional spline.mode setting support
* applied some code hints
#1110: Make Standardization and Conformance use Spline 0.4.x
* Change Spline version
* Make **Standardization** and **Confluence** depend and init the new Spline module instead of the old ones
* Replacing Spline properties in templates
* Using a different JSON library (instead of the one transitionally ingested from old Spline)
* Application properties templates adjutsted
* Helper scripts updated
* README.md updated (addressing also changes in Menas and Oozie from other PR)
* Missing copyright headers
Co-authored-by: Georgi Chochov <g.chochov@gmail.com>
* Server-side api endpoints to provide new configuration
* Server-side controller to serve the Spline WebJar
* Client-side changes to gather the lineage id and display the lineage new UI
* Added dependency to include the webjar
* Header to allow iframe display `<frame-options policy="SAMEORIGIN"/>`
* fixed `style.css`
* Rebasing to current `Develop`
* Unused import removed
…ncies

* Version names sorted alphabetically
# Conflicts (resolved):
#	pom.xml
#	scripts/bash/run_enceladus.sh
#	spark-jobs/pom.xml
% Conflicts:
%	dao/pom.xml
%	data-model/pom.xml
%	examples/pom.xml
%	menas/pom.xml
%	menas/src/main/resources/application.properties.template
%	migrations-cli/pom.xml
%	migrations/pom.xml
%	plugins-api/pom.xml
%	plugins-builtin/pom.xml
%	pom.xml
%	spark-jobs/pom.xml
%	utils/pom.xml
…2-snapshot

Aws poc update from develop 2.12 snapshot
 - all directly-hdfs touching stuff disabled (atum, performance measurements, info files, output path checking)

# Add menasfargate into hosts
sudo nano /etc/hosts
# paste
20.0.63.69 menasfargate
# save & exit (ctrl+O, ctrl+X)

# Running standardization works via:
spark-submit --class za.co.absa.enceladus.standardization.StandardizationJob --conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=http://menasfargate:8080 -Dstandardized.hdfs.path=s3://euw1-ctodatadev-dev-bigdatarnd-s3-poc/enceladusPoc/ao-hdfs-data/stdOutput/standardized-{0}-{1}-{2}-{3}" ~/enceladusPoc/spark-jobs-2.11.0-SNAPSHOT.jar --menas-credentials-file ~/enceladusPoc/menas-credentials.properties --dataset-name dk_test1_emr285 --raw-format json --dataset-version 1 --report-date 2019-11-27 --report-version 1 2> ~/enceladusPoc/stderr.txt
…ut, s3 conf output)

 0- all directly-hdfs touching stuff disabled (atum, performance measurements, info files, output path checking)

# Add menasfargate into hosts
sudo nano /etc/hosts
# paste
20.0.63.69 menasfargate
# save & exit (ctrl+O, ctrl+X)

# Running conformance works via:
spark-submit --class za.co.absa.enceladus.conformance.DynamicConformanceJob --conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=http://menasfargate:8080 -Dstandardized.hdfs.path=s3://euw1-ctodatadev-dev-bigdatarnd-s3-poc/enceladusPoc/ao-hdfs-data/stdOutput/standardized-{0}-{1}-{2}-{3}" ~/enceladusPoc/spark-jobs-2.11.0-SNAPSHOT.jar --menas-credentials-file ~/enceladusPoc/menas-credentials.properties --dataset-name dk_test1_emr285 --dataset-version 1 --report-date 2019-11-27 --report-version 1 2> ~/enceladusPoc/conf-log.txt
Feature/1416 S3 input/output for Enceladus on EMR (S2.4, H2.8.5)
This is a temporary solution. We currently experiment with
many forms of URLs, and having a regex there now slows us down.
Base automatically changed from feature/1612-separate-menas-into-2-tomcat-apps to develop-ver-3.0 April 12, 2021 19:13
…32-spline-06-integration

# Conflicts:
#	menas-web/Dockerfile
#	menas-web/nginx.conf
#	menas-web/start_menas_web.sh
#	menas-web/ui/index.html
#	menas-web/ui/package.json
#	menas-web/ui/service/RestDAO.js
#	menas-web/ui/service/RunRestDAO.js
#	menas-web/ui/ui5.yaml
#	menas/src/main/resources/start_menas.sh
#	menas/src/main/scala/za/co/absa/enceladus/menas/WebSecurityConfig.scala
@benedeki
Copy link
Collaborator

Just curious why you included the Swagger in this PR? 😉

@AdrianOlosutean
Copy link
Contributor Author

Just curious why you included the Swagger in this PR? 😉

I considered it's quite useful to add it after separating Menas. If you think it's better, I will put the changes in a separate PR

Copy link
Collaborator

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why you included the Swagger in this PR? 😉

I considered it's quite useful to add it after separating Menas. If you think it's better, I will put the changes in a separate PR

It is definitely useful, no doubt. But I see no crossover with the Spline integration. So having it in one PR - particularly when there already was a PR (maybe not on separated Menas) - seems contra productive, just increases the size of the PR.
Furthermore Spline is dragging behind a little. Swagger could be already merged.... Perhpas it would be worth to separate into its own PR. 🤔

README.md Outdated
@@ -119,7 +119,7 @@ password=changeme
--deploy-mode <client/cluster> \
--driver-cores <num> \
--driver-memory <num>G \
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_for_standardized_output>-{0}-{1}-{2}-{3} -Dspline.mongodb.url=<mongo_url_for_spline> -Dspline.mongodb.name=<spline_database_name> -Dhdp.version=<hadoop_version>" \
--conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=<menas_api_uri:port> -Dstandardized.hdfs.path=<path_for_standardized_output>-{0}-{1}-{2}-{3} -Dspline.producer.url=<url_for_spline_consumer> -Dhdp.version=<hadoop_version>" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an effort to decouple Enceladus and Spline, I wouldn't even mention this option here. If in application.conf it's good enough, and can be overridden as any other parameter.

spark-jobs/src/main/resources/spline.properties.template Outdated Show resolved Hide resolved
menas-web/ui/index.html Outdated Show resolved Hide resolved
menas-web/ui/package.json Outdated Show resolved Hide resolved
menas-web/ui/service/RestDAO.js Outdated Show resolved Hide resolved
@@ -28,7 +28,7 @@ import org.springframework.context.annotation.{Bean, Configuration}
class HDFSConfig @Autowired() (spark: SparkSession) {
private val logger = LoggerFactory.getLogger(this.getClass)

@Value("${menas.hadoop.auth.method}")
@Value("${menas.hadoop.auth.method:}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the colon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to fail before because of this, but we solved it another way. Anyway, it seems to be the only one without a colon here

Copy link
Contributor

@dk1844 dk1844 Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not forget what the colon means - in general, it changes the meaning of the property ${"my.property.key"} (without default value - so failing to find it in the config should cause a failure) to ${"my.property.key:defaultValueForThatKey"} (where failing to find the key in config just fallbacks to the default - and the nothing after the colon means default empty string, but having default provided nonetheless)

Most probably, you all knew all this, just stating it so we "just don't add" colons because it is present elsewhere, but we consider the true meaning - and here that it should be whether it is reasonable to provide default empty string when the key is not found in properties.

@@ -130,8 +126,11 @@ menas.oozie.mavenSparkJobsJarLocation=/za/co/absa/enceladus/spark-jobs/@project.
#Menas URL for submitted std and conf jobs
menas.oozie.menasApiURL=http://menasHostname:8080/menas/api

#Mongo address for spline for the submitted jobs
menas.oozie.splineMongoURL=mongodb://localhost:27017
#The URL address of Spline for the submitted jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get rid of it Spline reference from Oozie code completely. Let's take if from application.conf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought of this, but wouldn't it break Oozie integration?

scripts/bash/enceladus_env.template.sh Outdated Show resolved Hide resolved
scripts/bash/run_enceladus.sh Outdated Show resolved Hide resolved
…32-spline-06-integration

# Conflicts:
#	README.md
#	rest-api/src/main/scala/za/co/absa/enceladus/rest_api/WebSecurityConfig.scala
…32-spline-06-integration

# Conflicts:
#	pom.xml
#	spark-jobs/pom.xml
#	spark-jobs/src/main/resources/spline.properties.template
…on' into feature/1732-spline-06-integration

# Conflicts:
#	rest-api/src/main/scala/za/co/absa/enceladus/rest_api/WebSecurityConfig.scala
@AdrianOlosutean AdrianOlosutean marked this pull request as ready for review May 27, 2021 08:06
@dk1844 dk1844 added the PR:tested Only for PR - PR was tested by a tester (person) label Jun 1, 2021
@sonarcloud
Copy link

sonarcloud bot commented Jun 1, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@dk1844 dk1844 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (read the code and test-ran on EKS/Fargate with @AdrianOlosutean )

@@ -28,7 +28,7 @@ import org.springframework.context.annotation.{Bean, Configuration}
class HDFSConfig @Autowired() (spark: SparkSession) {
private val logger = LoggerFactory.getLogger(this.getClass)

@Value("${menas.hadoop.auth.method}")
@Value("${menas.hadoop.auth.method:}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the the empty string a desired default value of this field when all properties files in the project use menas.hadoop.auth.method=default. If we need the default value here, shouldn't it rather be default, then?

Suggested change
@Value("${menas.hadoop.auth.method:}")
@Value("${menas.hadoop.auth.method:default}")

Copy link
Contributor

@dk1844 dk1844 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (read the code and test-ran on EKS/Fargate with @AdrianOlosutean )

@AdrianOlosutean AdrianOlosutean merged commit 7f2d6e5 into develop-ver-3.0 Jun 2, 2021
@AdrianOlosutean AdrianOlosutean deleted the feature/1732-spline-06-integration branch June 2, 2021 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR:tested Only for PR - PR was tested by a tester (person)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants