-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aws-POC merge to develop #1632
Aws-POC merge to develop #1632
Conversation
% Conflicts: % dao/pom.xml % data-model/pom.xml % examples/pom.xml % menas/pom.xml % menas/src/main/resources/application.properties.template % migrations-cli/pom.xml % migrations/pom.xml % plugins-api/pom.xml % plugins-builtin/pom.xml % pom.xml % spark-jobs/pom.xml % utils/pom.xml
…2-snapshot Aws poc update from develop 2.12 snapshot
- all directly-hdfs touching stuff disabled (atum, performance measurements, info files, output path checking) # Add menasfargate into hosts sudo nano /etc/hosts # paste 20.0.63.69 menasfargate # save & exit (ctrl+O, ctrl+X) # Running standardization works via: spark-submit --class za.co.absa.enceladus.standardization.StandardizationJob --conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=http://menasfargate:8080 -Dstandardized.hdfs.path=s3://euw1-ctodatadev-dev-bigdatarnd-s3-poc/enceladusPoc/ao-hdfs-data/stdOutput/standardized-{0}-{1}-{2}-{3}" ~/enceladusPoc/spark-jobs-2.11.0-SNAPSHOT.jar --menas-credentials-file ~/enceladusPoc/menas-credentials.properties --dataset-name dk_test1_emr285 --raw-format json --dataset-version 1 --report-date 2019-11-27 --report-version 1 2> ~/enceladusPoc/stderr.txt
…ut, s3 conf output) 0- all directly-hdfs touching stuff disabled (atum, performance measurements, info files, output path checking) # Add menasfargate into hosts sudo nano /etc/hosts # paste 20.0.63.69 menasfargate # save & exit (ctrl+O, ctrl+X) # Running conformance works via: spark-submit --class za.co.absa.enceladus.conformance.DynamicConformanceJob --conf "spark.driver.extraJavaOptions=-Dmenas.rest.uri=http://menasfargate:8080 -Dstandardized.hdfs.path=s3://euw1-ctodatadev-dev-bigdatarnd-s3-poc/enceladusPoc/ao-hdfs-data/stdOutput/standardized-{0}-{1}-{2}-{3}" ~/enceladusPoc/spark-jobs-2.11.0-SNAPSHOT.jar --menas-credentials-file ~/enceladusPoc/menas-credentials.properties --dataset-name dk_test1_emr285 --dataset-version 1 --report-date 2019-11-27 --report-version 1 2> ~/enceladusPoc/conf-log.txt
% Conflicts: % pom.xml
Feature/1416 S3 input/output for Enceladus on EMR (S2.4, H2.8.5)
This is a temporary solution. We currently experiment with many forms of URLs, and having a regex there now slows us down.
#1526 * FsUtils divided into LocalFsUtils & HdfsUtils * PathConfigSuite update * S3FsUtils with tail-recursive pagination accumulation - now generic with optional short-circuit breakOut TestRunnerJob updated to manually cover the cases - should serve as a basis for tests * HdfsUtils replace by trait DistributedFsUtils (except for MenasCredentials loading & nonSplittable splitting) * using final version of s3-powered Atum (3.0.0) * mockito-update version update, scalatest version update * S3FsUtilsSuite: exists, read, sizeDir(hidden, non-hidden, reucursive), non-splittable (simple, recursive with breakOut), delete (recursive), version find (simple - empty, recursive) * explicit stubbing fix for hyperdrive
* s3 using hadoop fs api * s3 sdk usage removed (pom, classes, tests) * atum final version 3.1.0 used * readStandardizationInputData(... path: String)(implicit ... fs: FileSystem) -> readStandardizationInputData(input: PathWithFs)
1499 Make Spline in Menas work for AWS
…e commons library version. Test-ran on EMR. (#1619)
* Tests pass * Depends on ATUM 3.1.1-SNAPSHOT (the bugfix for AbsaOSS/atum#48)
* put back Oozie * downgraded Spline
Conflicts addressed: # spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/datasource/DatasourceSuite.scala # spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/RuleOptimizationSuite.scala # spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/RulesSuite.scala # spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/mapping/JoinMappingRuleInterpreterSuite.scala
jobConfig.reportVersion match { | ||
case Some(version) => version | ||
case None => | ||
|
||
// publishFs for this specific feature (needed for missing reportVersion until reusable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not even sure what the comment means. 🤔 @dk1844 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the comment is needed (or at least that it is not meaningless). Maybe it should be reworded though if it is unclear. The gist of the meaning is this:
Everywhere else, one would want to reuse existing "Publish FS" that will reside in pathCfg.publish.fileSystem
in this situation (to get an instance of HadoopFsUtils
bound to this FS). However, here, you cannot do it, because pathCfg
object is not yet available at this point. So, you the would-be-publishFs is here obtained temporarily from the dataset.hdfsPublishPath.
That's all. The comment just meant to explain why seemingly the call of dataset.hdfsPublishPath
is pointlessly repeated here and also later in the factory method of PathConfig. It just could not be reused reasonably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion for the comment here is:
// Since "pathConfig.publish.fileSystem" is not available at this point yet, a temporary publish-FS is used here instead
This should explain the intention quite a bit and not be too complicated.
val in = fs.open(new Path(hdfsPath)) | ||
val content = Array.fill(in.available())(0.toByte) | ||
in.readFully(content) | ||
val tmpFile = File.createTempFile("enceladusFSUtils", "hdfsFileToLocalTemp") | ||
tmpFile.deleteOnExit() | ||
FileUtils.writeByteArrayToFile(tmpFile, content) | ||
tmpFile.getAbsolutePath | ||
|
||
// why not use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems so. @dk1844 can you confirm, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment still makes sense to me. Why in this case we do not use the simpler call as outlined in the comment:
fs.copyToLocalFile(false, new Path(hdfsPath), new Path("someLocalName"), true)
pom.xml
Outdated
<scala.version>2.11.12</scala.version> | ||
<scalatest.maven.version>2.0.0</scalatest.maven.version> | ||
<scalatest.version>3.2.2</scalatest.version> | ||
<scopt.version>4.0.0-RC2</scopt.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can scopt be updated to the 4.0.0 release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just read through, just small non-blocking comments)
menas/ui/css/style.css
Outdated
margin-top: 1rem !important; | ||
} | ||
|
||
h5[id$=datasetDetailView--scheduleTimingTitle] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible, but quite ugly. Couldn't the element just have a custom CSS class instead of relying on the generated id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not nice. But I don't want to add improvements (not required by the merge) into the PR. Please consider fixing it, and creating a dedicated PR for it.
* Addressed issues identified by reviewers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this PR being so big. 😊
menas/ui/css/style.css
Outdated
margin-top: 1rem !important; | ||
} | ||
|
||
h5[id$=datasetDetailView--scheduleTimingTitle] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not nice. But I don't want to add improvements (not required by the merge) into the PR. Please consider fixing it, and creating a dedicated PR for it.
Conflicts addressed: # pom.xml # spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/InterpreterSuite.scala # utils/src/test/scala/za/co/absa/enceladus/utils/config/ConfigReaderSuite.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some tests
spark jobs
- complex standardization
- conformance with almost all of the rules
- standardization&conformance with mapping table rule
UI
- create and update schema
- create and update mapping table
- create and update dataset and conformance rules
Comparison with data on current Develop branche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code reviewed
No functional changes after Veronika's approve
…eladus into aws-merge-to-develop
Kudos, SonarCloud Quality Gate passed!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code reviewed
Closes #1622