-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SchemaRegistry] investigate Apache Avro dependency #20708
Labels
blocking-release
Blocks release
Client
This issue points to a problem in the data-plane of the library.
Messaging
Messaging crew
Schema Registry
Milestone
Comments
swathipil
added
Client
This issue points to a problem in the data-plane of the library.
Schema Registry
Messaging
Messaging crew
labels
Sep 15, 2021
This was referenced Sep 15, 2021
concerns about fastavro:
other notes about fastavro:
|
other libraries that use Apache Avro (either in the library or in examples):
article comparing avro packages: |
|
SERIALIZE WITH FASTAVRO from fastavro import parse_schema, schemaless_writer
from io import BytesIO
def serialize_fastavro(schema, value):
# type: (Union[bytes, str], dict) -> bytes
# schemas passed in to fastavro must be converted to dict first
json_schema = to_dict(schema)
parsed_schema = parse_schema(json_schema, _write_hint=False)
stream = BytesIO()
with stream:
schemaless_writer(stream, parsed_schema, value)
encoded_data = stream.getvalue()
return encoded_data SERIALIZING WITH AVRO avro_schema_cache = {}
def serialize_avro(schema, value):
if not isinstance(schema, avro.schema.Schema):
schema = avro.schema.parse(schema)
try:
writer = avro_schema_cache[str(schema)]
except KeyError:
writer = DatumWriter(schema)
avro_schema_cache[str(schema)] = writer
stream = BytesIO()
with stream:
writer.write(value, BinaryEncoder(stream))
encoded_data = stream.getvalue()
return encoded_data DESERIALIZE WITH FASTAVRO from fastavro import parse_schema, schemaless_reader
from io import BytesIO
def deserialize_fastavro(schema, data):
# type: (Union[bytes, str], bytes) -> str
# schemas passed in to fastavro must be converted to dict first
json_schema = to_dict(schema)
parsed_schema = parse_schema(json_schema, _write_hint=False)
stream = BytesIO(data)
with stream:
decoded_data = schemaless_reader(stream, parsed_schema) DESERIALIZE WITH AVRO avro_reader_cache = {}
def deserialize_avro(schema, data):
if not hasattr(data, 'read'):
data = BytesIO(data)
if not isinstance(schema, avro.schema.Schema):
schema = avro.schema.parse(schema)
try:
reader = avro_reader_cache[str(schema)]
except KeyError:
reader = DatumReader(writers_schema=schema)
avro_reader_cache[str(schema)] = reader
with data:
bin_decoder = BinaryDecoder(data)
decoded_data = reader.read(bin_decoder) |
sample loading into pandas dataframa |
ACTION ITEMS:
|
This was referenced Sep 24, 2021
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
blocking-release
Blocks release
Client
This issue points to a problem in the data-plane of the library.
Messaging
Messaging crew
Schema Registry
fastavro==0.24.2
for Python 2.7 has a dependency onpytz
Issues
tab in GHfastavro
is expected much fasterfastavro==0.24.2
used for Python 2.7, installspytz
dependencyfastavro
does not yet support Python 3.10The text was updated successfully, but these errors were encountered: