-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3704: name validator interface #2053
Conversation
pom.xml
Outdated
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-resources-plugin</artifactId> | ||
<version>${maven-resources-plugin.version}</version> | ||
</plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this addition: why was the maven-resources-plugin
added? Was it to be able to specify a recent version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI failed at "maven4" step, see last comment on AVRO-3701, i put this plugin from 1.7.0 to 3.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2075 - I've extracted just this change, so that all other PRs won't break because of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still fails so I created https://issues.apache.org/jira/browse/MRRESOURCES-124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
One question (but it's also a matter of taste): should we nest everything in the Schema class? Or is name validation something you to before creating a schema/field with that name?
In the near future, I hope my PR to extract a separate SchemaParser will be merged (the last in the series #1588, #1589, #1954). This will (partially) extract parsing from the Schema class. Would that still be clean when the NameValidator interface is nested in the Schema class?
NameValidator has no direct link with Schema nor SchemaParser; so it could be easily embeded in SchemaParser or put in a separate Java file with your PR (i didn't figure out yet what would be the best). {"name": "User", "type": "record", "fields": [{"name": "current_status", "type": "Status"}]}
{"name": "Status", "type": "record", "fields": [{"name": "author", "type": "User"}]} I would adapt it if yours is merge before :) |
f9d3c78
to
debf201
Compare
debf201
to
28ea995
Compare
28ea995
to
350ecd2
Compare
This naming-thing was also a problem, if we use that avro-lib in combination with parquet. It is an nice adapter, but that validating of the names entailed that we can't write parquet files with "all" names, also at reading-parquets file in with that library caused in exceptions Take a look here: https://stackoverflow.com/a/39734610/2182302 try (ParquetWriter<GenericData.Record> writer = AvroParquetWriter
.<GenericData.Record>builder(fileToWrite)
.withSchema(schema)
.withConf(new Configuration())
.withCompressionCodec(CompressionCodecName.SNAPPY)
.build()) {
for (GenericData.Record record : recordsToWrite) {
writer.write(record);
}
} I think a lot of projects use that AvroParquetWriter Maybe we should deactivate that naming-validation in that |
Actually we used that library by patching that avro-schema-java-file with some maven magic: private static String validateName(String name) {
// if (!validateNames.get())
// return name; // not validating names
// if (name == null)
// throw new SchemaParseException("Null name");
// int length = name.length();
// if (length == 0)
// throw new SchemaParseException("Empty name");
// char first = name.charAt(0);
// if (!(Character.isLetter(first) || first == '_'))
// throw new SchemaParseException("Illegal initial character: " + name);
// for (int i = 1; i < length; i++) {
// char c = name.charAt(i);
// if (!(Character.isLetterOrDigit(c) || c == '_'))
// throw new SchemaParseException("Illegal character in: " + name);
// }
return name;
} This pull-request can solve that, so we don't have to trick like that any more?! |
How was it possible to choose no validation at all, if i use |
As NameValidator is an interface, in fact, you can also put your own customize name validator. |
Yes you are right, it takes a normal AvroSchema directly. |
Indeed, I had not taken into account SchemaBuilder (only Schema parser). |
yeah your last commit AVRO-3704: add setter to static name validator should do the trick. So but i think we must wait until this pr is merged? What do you think when it would happen? And you are right, a solution by injecting the validation rule into the builder direclty would be nicer, but doing this over that static method is also ok. May i ask one other question: Why do you store that validation rule in an Threadlocal and not just in a (volatile or normal) variable? -> performance reasons? |
Store in static to be accessible from anywhere without modifying code. |
are there any updates, when this comes into the master?^^ |
No, this add possibilities but this is fully compatible with current master. |
I meant that as by default, the new validator behave like current validation, the merge of this PR in master won't modify current behavior. setNameValidator is indeed a new method in this PR, allow you to update validation method. |
@RyanSkraba : WDYT about this PR for Avro naming ? |
Hey -- this is on my radar! I should have a bit more time once the release 1.11.2 candidate is (finally) done. Thanks for your patience... |
c60a5dc
to
b9db58d
Compare
b9db58d
to
05e8a56
Compare
* AVRO-3704: name validator interface
What is the purpose of the change
As explain in AVRO-3704, this is to allow, in Java, to choose name validation type. Currently, there is only a choice between no validation at all or a validation that accept accent, which is not accepted by official doc.
Here, we allow user to inject its own interface of name validation ; and, in same time, proposed 3 kind of implementation :
Verifying this change
This change added tests and can be verified as follows:
Unit test SchemaNameValidatorTest is added and can be run locally (like it is run in CI)
Documentation