-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-12674 Modified ValidateCSV to make the schema optional if a head… #8362
base: main
Are you sure you want to change the base?
Conversation
Any reason for not using ValidateRecord processor if the only requirement is to confirm that the data is valid CSV without any specific constraint? |
@pvillard31 The ValidateRecord processor splits the input FlowFile into 2 Flowfiles, one for valid and another for invalid records. With the change to ValidateCSV, the whole file will be routed to either valid or invalid. |
.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
Outdated
Show resolved
Hide resolved
.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
Outdated
Show resolved
Hide resolved
.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
Outdated
Show resolved
Hide resolved
@exceptionfactory Can you please restart the failed job? It does not seem related to the changes. Thanks! |
Any updates on reviewing this change? |
There are merge conflicts that need to be resolved |
@mattyb149 I've rebased against main. Thank You |
.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
Show resolved
Hide resolved
.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
Outdated
Show resolved
Hide resolved
.../nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ValidateCsv.java
Outdated
Show resolved
Hide resolved
…er is provided. Added validate on attribute option.
…roperty to match ValidateXML.
|
||
InputStream stream; | ||
if (context.getProperty(CSV_SOURCE_ATTRIBUTE).isSet()) { | ||
stream = new ByteArrayInputStream(flowFile.getAttribute(context.getProperty(CSV_SOURCE_ATTRIBUTE).getValue()).getBytes(StandardCharsets.UTF_8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Freedom9339 Thanks for making that change. It turned out well except you need to call .evaluateAttributeExpressions() without passing a flowfile after the .getProperty(CSV_SOURCE_ATTRIBUTE) call. After that, the change looks complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Freedom9339 Thanks for your contribution.
@@ -120,11 +120,20 @@ public class ValidateCsv extends AbstractProcessor { | |||
.description("The schema to be used for validation. Is expected a comma-delimited string representing the cell " | |||
+ "processors to apply. The following cell processors are allowed in the schema definition: " | |||
+ allowedOperators.toString() + ". Note: cell processors cannot be nested except with Optional.") | |||
.required(true) | |||
.required(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a dependsOn(HEADER, "false")
, unless you can provide the explicit schema yet there be a header line that should be ignored. If that's the case maybe update the documentation for Header to reflect the current behavior based on the different combinations of settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "unless" part is true here. But now there are merge conflicts, sorry I lost track of this
…er is provided. Added validate on attribute option.
Summary
NIFI-12674
Made the schema optional for the ValidateCSV processor if a header is provided. In this case, only the structure of the CSV will validated, using the header to determine how many fields each line should have.
Additionally, a new validation strategy was implemented, Validate on Attribute. This works similar to the way ValidateXML works in which the value of a given attribute of a FlowFile will be treated as the contents of a CSV file. Validation will be done on that attribute and not on the content of the FlowFile.
I would also like to note that I made the stream a variable that can be assigned to either the FlowFile content or the value of the attribute. In doing this, I removed the need for an inner method, and thus all of the variables that previously needed to be Atomic References could now be regular variables.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation