Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/1371 Combined Standardization Conformance Job #1392

Merged
merged 28 commits into from
Jul 30, 2020

Conversation

AdrianOlosutean
Copy link
Contributor

@AdrianOlosutean AdrianOlosutean commented Jun 12, 2020

Closes #1371
Closes #1443

Proposal for main function

…dardization-conformance' into feature/1371-combined-std-conf-job

# Conflicts:
#	examples/src/main/scala/za/co/absa/enceladus/examples/CustomRuleSample1.scala
#	examples/src/main/scala/za/co/absa/enceladus/examples/CustomRuleSample2.scala
#	examples/src/main/scala/za/co/absa/enceladus/examples/CustomRuleSample3.scala
#	examples/src/main/scala/za/co/absa/enceladus/examples/CustomRuleSample4.scala
#	examples/src/main/scala/za/co/absa/enceladus/examples/interpreter/rules/custom/UppercaseCustomConformanceRule.scala
#	examples/src/main/scala/za/co/absa/enceladus/examples/interpreter/rules/custom/XPadCustomConformanceRule.scala
#	examples/src/test/scala/za/co/absa/enceladus/examples/interpreter/rules/custom/UppercaseCustomConformanceRuleSuite.scala
#	examples/src/test/scala/za/co/absa/enceladus/examples/interpreter/rules/custom/XPadCustomConformanceRuleSuite.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/common/CommonJobExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/ConformanceExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/ConformanceReader.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/DynamicConformanceJob.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/HyperConformance.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/DynamicInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/InterpreterContext.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/ArrayCollapseInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/ArrayExplodeInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/CastingRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/ConcatenationRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/DropRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/LiteralRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/MappingRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/MappingRuleInterpreterBroadcast.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/MappingRuleInterpreterGroupExplode.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/NegationRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/RuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/SingleColumnRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/SparkSessionConfRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/UppercaseRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/standardization/StandardizationExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/standardization/StandardizationJob.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/standardization/StandardizationReader.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/config/ConformanceConfigSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/ArrayConformanceSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/ChorusMockSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/InterpreterSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/LiteralJoinMappingRuleTest.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/fixtures/NestedStructsFixture.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/fixtures/StreamingFixture.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/CastingRuleSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/NegationRuleSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/RulesSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/TestRuleBehaviors.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/custom/CustomRuleSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/testcasefactories/NestedTestCaseFactory.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/testcasefactories/SimpleTestCaseFactory.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/StandardizationCobolAsciiSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/StandardizationCobolEbcdicSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/StandardizationJsonSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/StandardizationParquetSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/StandardizationRerunSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/config/StandardizationConfigSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/standardization/fixtures/CsvFileFixture.scala
@AdrianOlosutean AdrianOlosutean added the work in progress Work on this item is not yet finished (mainly intended for PRs) label Jun 25, 2020
…' into feature/1371-combined-std-conf-job

# Conflicts:
#	examples/src/main/scala/za/co/absa/enceladus/examples/interpreter/rules/custom/UppercaseCustomConformanceRule.scala
#	examples/src/main/scala/za/co/absa/enceladus/examples/interpreter/rules/custom/XPadCustomConformanceRule.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/common/CommonJobExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/ConformanceExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/PropertiesProvider.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/config/ConformanceConfigInstance.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/DynamicInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/InterpreterContext.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/ArrayCollapseInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/ArrayExplodeInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/CastingRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/ConcatenationRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/DropRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/LiteralRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/MappingRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/MappingRuleInterpreterBroadcast.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/MappingRuleInterpreterGroupExplode.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/NegationRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/RuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/SingleColumnRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/SparkSessionConfRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/UppercaseRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/standardization/PropertiesProvider.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/standardization/StandardizationExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/standardization/config/StandardizationConfigInstance.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/RulesSuite.scala
#	spark-jobs/src/test/scala/za/co/absa/enceladus/conformance/interpreter/rules/custom/CustomRuleSuite.scala
Base automatically changed from feature/1015-extract-common-standardization-conformance to develop July 14, 2020 15:58
…ned-std-conf-job

# Conflicts:
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/ConformanceExecution.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/ConformancePropertiesProvider.scala
@AdrianOlosutean AdrianOlosutean removed the work in progress Work on this item is not yet finished (mainly intended for PRs) label Jul 14, 2020
@AdrianOlosutean AdrianOlosutean marked this pull request as ready for review July 14, 2020 19:08
@AdrianOlosutean
Copy link
Contributor Author

I was able to run the combined StandardizationConformanceJob locally, as well as the DynamicConformance one. Can someone confirm that the e2e tests pass too?

)

case class InterpreterContextArgs(datasetName: String,
reportDate: String = "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missaligned

import za.co.absa.enceladus.standardization.config.StandardizationParser
import za.co.absa.enceladus.standardization_conformance.config.StdConformanceConfig

trait StdConformanceExecution extends StandardizationExecution with ConformanceExecution {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small (and just a suggestion): What about naming it StdAndConfExecution?

import scala.util.Try


case class StdConformanceConfig(datasetName: String = "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - what about StdAndConfConfig?

@@ -76,19 +75,21 @@ trait StandardizationExecution extends CommonJobExecution {
// Add the raw format of the input file(s) to Atum's metadata
Atum.setAdditionalInfo("raw_format" -> cmd.rawFormat)

PerformanceMetricTools.addJobInfoToAtumMetadata("std", preparationResult.pathCfg.inputPath, preparationResult.pathCfg.outputPath,
// OutputPath is standardizationPath on the Standardization phase of the combined job
val outputPath = preparationResult.pathCfg.standardizationPath.getOrElse(preparationResult.pathCfg.outputPath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any reader of the code not fully the structure of the code, this would be rather confusing... (similar as in ConformanceExecution)

persistStorageLevel: Option[StorageLevel] = None)

object InterpreterContextArgs {
def fromConformanceConfig[T](conformanceConfig: ConformanceParser[T]): InterpreterContextArgs = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that. If considered too long, I don't have a problem with ConfConfigParser.
Btw, same with StandardizationParser

fsUtils: FileSystemVersionUtils): Unit = {
val cmdLineArgs: String = args.mkString(" ")

// StandardizationPath is the input on the Conformance phase of the combined job
val standardizationPath = preparationResult.pathCfg.standardizationPath.getOrElse(preparationResult.pathCfg.inputPath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any reader of the code not fully the structure of the code, this would be rather confusing... (similar as in StandardizationExecution)

@DzMakatun
Copy link
Contributor

What will be an application name for the combined job?
E.g. standalone Standardization uses the pattern .appName(s"Standardisation $enceladusVersion ${cmd.datasetName} ${cmd.datasetVersion} ${cmd.reportDate} $reportVersion")
standalone Conformance uses
.appName(s"Dynamic Conformance $enceladusVersion ${cmd.datasetName} ${cmd.datasetVersion} ${cmd.reportDate} $reportVersion")
The combined app needs something similar, that we can identify it by parsing the name string in monitoring.

@AdrianOlosutean
Copy link
Contributor Author

AdrianOlosutean commented Jul 23, 2020

Good point @DzMakatun , I forgot about that one. I would call them Standardisation, Dynamic Conformance and Standardization Conformance. I can also keep it consistent and change the one for Standardisation to Standardization

@@ -164,8 +163,7 @@ trait StandardizationExecution extends CommonJobExecution {
handleEmptyOutput(sourceId)
}

// OutputPath is standardizationPath on the Standardization phase of the combined job
val outputPath = preparationResult.pathCfg.standardizationPath.getOrElse(preparationResult.pathCfg.outputPath)
val outputPath = preparationResult.pathCfg.standardizationPath
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable not needed. Expression can be easily used on line 175 directly

import za.co.absa.enceladus.utils.fs.FileSystemVersionUtils
import za.co.absa.enceladus.utils.modules.SourcePhase
import za.co.absa.enceladus.utils.udf.UDFLibrary

object StandardizationConformanceJob extends StdConformanceExecution {
object StandardizationAndConformanceJob extends StandardizationExecution with ConformanceExecution {
private val jobName = "Standardization Conformance"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about?

Suggested change
private val jobName = "Standardization Conformance"
private val jobName = "Standardization and Conformance"

or

Suggested change
private val jobName = "Standardization Conformance"
private val jobName = "Standardization & Conformance"

// die if the output path exists
validateForExistingOutputPath(fsUtils, pathCfg)
validateForExistingOutputPath(fsUtils, outputPath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make this abstract here, and implemented in successors. (Combined job could call both ancestor's version)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Comment on lines 93 to 94
log.info(s"input path: $inputPath")
log.info(s"output path: $outputPath")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would move this logging to prepareStandardization/prepareConformace, and make it specific "Standardization input path: ..." etc,

Comment on lines 171 to 172
rawPath = buildRawPath(cmd.asInstanceOf[StandardizationConfigParser[StandardizationConformanceConfig]], dataset, reportVersion),
publishPath = buildPublishPath(cmd.asInstanceOf[ConformanceConfigParser[StandardizationConformanceConfig]], dataset, reportVersion),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution
Would not cast the cmd here. Let the function within CommonJobExecution here return only the default values. And in the overrides check the class, cast it, and eventually use the configuration override.


val performance = initPerformanceMeasurer(pathCfg.inputPath)
val performance = initPerformanceMeasurer(inputPath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking how to do this... 🤔
The point is that CommonJobExecution should not know about any Conformance, Standardization or their specific configs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also put this line into the prepareStandardization/prepareConformance` 🤔

@DzMakatun
Copy link
Contributor

DzMakatun commented Jul 24, 2020

Good point @DzMakatun , I forgot about that one. I would call them Standardisation, Dynamic Conformance and Standardization Conformance. I can also keep it consistent and change the one for Standardisation to Standardization

Please, keep in mind, the easier to parse and distinguish the jobs the better. It's not ideal when the names have different number of words and the first word is the same in two different jobs. I would go for something like:

  1. Enceladus Standardization ...
  2. Enceladus Conformance ...
  3. Enceladus Combined ...
    3a. Or something like Enceladus Standardization_and_Conformance ...

What do you think?

@benedeki
Copy link
Collaborator

Good point @DzMakatun , I forgot about that one. I would call them Standardisation, Dynamic Conformance and Standardization Conformance. I can also keep it consistent and change the one for Standardisation to Standardization

Please, keep in mind, the easier to parse and distinguish the jobs the better. It's not ideal when the names have different number of words and the first word is the same in two different jobs. I would go for something like:

1. Enceladus Standardization ...

2. Enceladus Conformance ...

3. Enceladus Combined ...
   3a. Or something like Enceladus Standardization_and_Conformance ...

What do you think?

Good point @DzMakatun , I forgot about that one. I would call them Standardisation, Dynamic Conformance and Standardization Conformance. I can also keep it consistent and change the one for Standardisation to Standardization

Please, keep in mind, the easier to parse and distinguish the jobs the better. It's not ideal when the names have different number of words and the first word is the same in two different jobs. I would go for something like:

1. Enceladus Standardization ...

2. Enceladus Conformance ...

3. Enceladus Combined ...
   3a. Or something like Enceladus Standardization_and_Conformance ...

What do you think?

I like the suggestion, despite it goes little bit against your own words - first words is always Enceladus. It's good that it's unified, for all 3,
But the underscores are weird. What about Enceladus Standardization&Conformance (I admit suggested it already in other comment)

@DzMakatun
Copy link
Contributor

Good point @DzMakatun , I forgot about that one. I would call them Standardisation, Dynamic Conformance and Standardization Conformance. I can also keep it consistent and change the one for Standardisation to Standardization

Please, keep in mind, the easier to parse and distinguish the jobs the better. It's not ideal when the names have different number of words and the first word is the same in two different jobs. I would go for something like:

1. Enceladus Standardization ...

2. Enceladus Conformance ...

3. Enceladus Combined ...
   3a. Or something like Enceladus Standardization_and_Conformance ...

What do you think?

Good point @DzMakatun , I forgot about that one. I would call them Standardisation, Dynamic Conformance and Standardization Conformance. I can also keep it consistent and change the one for Standardisation to Standardization

Please, keep in mind, the easier to parse and distinguish the jobs the better. It's not ideal when the names have different number of words and the first word is the same in two different jobs. I would go for something like:

1. Enceladus Standardization ...

2. Enceladus Conformance ...

3. Enceladus Combined ...
   3a. Or something like Enceladus Standardization_and_Conformance ...

What do you think?

I like the suggestion, despite it goes little bit against your own words - first words is always Enceladus. It's good that it's unified, for all 3,

There are also other jobs in Spark History, so having the first word Enceladus helps to get the first level of categorization.

But the underscores are weird. What about Enceladus Standardization&Conformance (I admit suggested it already in other comment)

I see, could be with '&' or maybe use camel notation, like StandardizationConformance, StandardizationAndConformance?

@@ -166,44 +161,37 @@ trait CommonJobExecution {
}
}

protected def getPathConfig[T](cmd: JobConfigParser[T], dataset: Dataset, reportVersion: Int): PathConfig = {
protected def getDefaultPathConfig[T](cmd: JobConfigParser[T], dataset: Dataset, reportVersion: Int): PathConfig = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to make this the implementation of the now abstract getPathConfig?

@@ -77,6 +81,21 @@ trait ConformanceExecution extends CommonJobExecution {
preparationResult.reportVersion)
}

override def getPathConfig[T](cmd: JobConfigParser[T], dataset: Dataset, reportVersion: Int): PathConfig = {
val pathOverride = cmd.asInstanceOf[ConformanceConfig].publishPathOverride
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap with next one and you don't even need the val. Also less "context switching".

@@ -33,7 +33,8 @@ case class CoalesceRuleInterpreter(rule: CoalesceConformanceRule) extends RuleIn
override def conformanceRule: Option[ConformanceRule] = Some(rule)

def conform(df: Dataset[Row])
(implicit spark: SparkSession, explosionState: ExplosionState, dao: MenasDAO, progArgs: ConformanceConfig): Dataset[Row] = {
(implicit spark: SparkSession, explosionState: ExplosionState, dao: MenasDAO,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiline parameters, they should be one per line.

@@ -83,6 +86,21 @@ trait StandardizationExecution extends CommonJobExecution {
dao.getSchema(preparationResult.dataset.schemaName, preparationResult.dataset.schemaVersion)
}

override def getPathConfig[T](cmd: JobConfigParser[T], dataset: Dataset, reportVersion: Int): PathConfig = {
val pathOverride = cmd.asInstanceOf[StandardizationConfig].rawPathOverride
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Swap the lines, get rid of the val.

}

override def validateOutputPath(fsUtils: FileSystemVersionUtils, pathConfig: PathConfig): Unit = {
validateIfPathAlreadyExists(fsUtils, pathConfig.standardizationPath)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

val jobCmd = cmd.asInstanceOf[StandardizationConformanceConfig]
val rawPathOverride = jobCmd.rawPathOverride
val publishPathOverride = jobCmd.publishPathOverride
val defaultConfig = getDefaultPathConfig(cmd, dataset, reportVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would make this one the first step of the function. Here it's more fore consistency with he other versions of the function.

…ned-std-conf-job

# Conflicts:
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/FillNullsRuleInterpreter.scala
#	spark-jobs/src/main/scala/za/co/absa/enceladus/conformance/interpreter/rules/RuleInterpreter.scala
benedeki
benedeki previously approved these changes Jul 30, 2020
Copy link
Collaborator

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • code reviewed
  • pulled
  • built
  • run

The few remarks are really just about code style.

import za.co.absa.enceladus.standardization.interpreter.StandardizationInterpreter
import za.co.absa.enceladus.standardization.interpreter.stages.PlainSchemaGenerator
import za.co.absa.enceladus.utils.fs.FileSystemVersionUtils
import za.co.absa.enceladus.utils.modules.SourcePhase
import za.co.absa.enceladus.utils.performance.PerformanceMetricTools
import za.co.absa.enceladus.utils.performance.{PerformanceMeasurer, PerformanceMetricTools}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import za.co.absa.enceladus.utils.performance.{PerformanceMeasurer, PerformanceMetricTools}
import za.co.absa.enceladus.utils.performance.PerformanceMetricTools

Unused import

if (cmd.rawFormat.equalsIgnoreCase("fixed-width")) {
HashMap("trimValues" -> cmd.fixedWidthTrimValues.map(BooleanParameter))
} else {
HashMap()
}
}

private def getCobolOptions(cmd: StandardizationConfig, dataset: Dataset)(implicit dao: MenasDAO): HashMap[String, Option[RawFormatParameter]] = {
private def getCobolOptions[T](cmd: StandardizationConfigParser[T], dataset: Dataset)(implicit dao: MenasDAO): HashMap[String, Option[RawFormatParameter]] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny: Line too long

if (cmd.rawFormat.equalsIgnoreCase("xml")) {
HashMap("rowtag" -> cmd.rowTag.map(StringParameter))
} else {
HashMap()
}
}

private def getCsvOptions(cmd: StandardizationConfig, numberOfColumns: Int = 0): HashMap[String, Option[RawFormatParameter]] = {
private def getCsvOptions[T](cmd: StandardizationConfigParser[T], numberOfColumns: Int = 0): HashMap[String, Option[RawFormatParameter]] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny: Line too long

failOnInputNotPerSchema: Boolean = false,

credsFile: Option[String] = None,
keytabFile: Option[String] = None) extends StandardizationConfigParser[StandardizationConformanceConfig]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny: Line too long

Comment on lines 61 to 62
override def withIsCatalystWorkaroundEnabled(value: Option[Boolean]): StandardizationConformanceConfig = copy(isCatalystWorkaroundEnabled = value)
override def withAutocleanStandardizedFolder(value: Option[Boolean]): StandardizationConformanceConfig = copy(autocleanStandardizedFolder = value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also too long lines, here perhaps worth to mark as exception //scalastyle:ignore

@sonarcloud
Copy link

sonarcloud bot commented Jul 30, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@HuvarVer HuvarVer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • review
  • pull
  • build
  • run

I like the speed of a new job.
Tested separate jobs by Hermes too, run the new job. Looks good.

@AdrianOlosutean AdrianOlosutean merged commit 946ccec into develop Jul 30, 2020
@AdrianOlosutean AdrianOlosutean deleted the feature/1371-combined-std-conf-job branch July 30, 2020 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants