-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map AvroSchema to Arrow (#4886) #5009
Conversation
@@ -27,6 +27,8 @@ mod schema; | |||
|
|||
mod compression; | |||
|
|||
mod codec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all kept crate-private to allow us to revisit this design as things evolve
#[serde(borrow)] | ||
Union(Vec<Schema<'a>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a mistake, Unions are encoded at a higher-level
/// To accommodate this we special case two-variant unions where one of the | ||
/// variants is the null type, and use this to derive arrow's notion of nullability | ||
#[derive(Debug, Copy, Clone)] | ||
enum Nulls { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole setup is rather annoying, but necessary to provide a reasonable API
arrow-avro/src/codec.rs
Outdated
// https://avro.apache.org/docs/1.11.1/specification/#logical-types | ||
match (t.attributes.logical_type, &mut meta.codec) { | ||
(Some("decimal"), c @ Codec::Fixed(_)) => { | ||
return Err(ArrowError::NotYetImplemented(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't intend to support decimals in a first version of this crate
arrow-avro/src/codec.rs
Outdated
#[derive(Debug, Default)] | ||
struct Resolver<'a> { | ||
map: HashMap<(&'a str, &'a str), AvroField>, | ||
} | ||
|
||
impl<'a> Resolver<'a> { | ||
fn register(&mut self, name: &'a str, namespace: Option<&'a str>, schema: AvroField) { | ||
self.map.insert((name, namespace.unwrap_or("")), schema); | ||
} | ||
|
||
fn resolve(&self, name: &str, namespace: Option<&'a str>) -> Result<AvroField, ArrowError> { | ||
let (namespace, name) = name | ||
.rsplit_once('.') | ||
.unwrap_or_else(|| (namespace.unwrap_or(""), name)); | ||
|
||
self.map | ||
.get(&(namespace, name)) | ||
.ok_or_else(|| ArrowError::ParseError(format!("Failed to resolve {namespace}.{name}"))) | ||
.cloned() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use some docstrings. For once: what's a namespace and why is the top-level namespace item
?
Looks good to me overall. Btw does it even matter which position the null union is in (0,1), I guess you're keeping it to map correctly to the input. |
Which issue does this PR close?
Part of #4886
Rationale for this change
The logic to map an avro schema to arrow is not entirely straightforward, this PR adds the logic necessary to perform this transformation.
What changes are included in this PR?
Are there any user-facing changes?