-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working example of list_flights with ObjectStore #5116
Comments
So the issue here is tonic is imposing a 'static lifetime bound on the returned stream, which is likely a historical artifact from when GATs were not supported. You have a couple of options here:
Otherwise you will need to use something like https://docs.rs/ouroboros/latest/ouroboros/index.html to construct a self-referential stream. |
@tustvold, thank you very much for taking the time to respond. After quite a bit more experimentation, I was able to get the It's a little weird since I need the Importsuse arrow::ipc::writer::IpcWriteOptions;
use arrow_flight::IpcMessage;
use arrow_flight::{
flight_descriptor::DescriptorType, flight_service_server::FlightService,
flight_service_server::FlightServiceServer, Action, ActionType, Criteria, Empty, FlightData,
FlightDescriptor, FlightInfo, HandshakeRequest, HandshakeResponse, PutResult, SchemaAsIpc,
SchemaResult, Ticket,
};
use base64::prelude::BASE64_STANDARD;
use base64::Engine;
use bytes::Bytes;
use futures::channel::mpsc;
use futures::stream::{BoxStream, StreamExt};
use log::{debug, error, info};
use object_store::ObjectMeta;
use object_store::{local::LocalFileSystem, ObjectStore};
use parquet::arrow::async_reader::{AsyncFileReader, ParquetObjectReader};
use parquet::arrow::parquet_to_arrow_schema;
use rand::distributions::{Alphanumeric, DistString};
use std::collections::HashMap;
use std::path::Path;
use std::sync::{Arc, Mutex};
use tonic::transport::Server;
use tonic::{Request, Response, Status, Streaming}; async fn list_flights(
&self,
request: Request<Criteria>,
) -> Result<Response<Self::ListFlightsStream>, Status> {
let context = self.check_session_token(&request)?;
let (mut tx, rx) = mpsc::channel::<(ObjectMeta, ParquetObjectReader)>(2);
let store = context.object_store;
tokio::spawn(async move {
let prefix = None;
let mut objects = store.list(prefix);
while let Some(md) = objects.next().await.transpose().unwrap() {
let reader = ParquetObjectReader::new(store.clone(), md.clone());
if let Err(_) = tx.try_send((md, reader)) {
debug!("rx channel dropped");
break;
}
}
tx.close_channel();
});
let result = rx.filter_map(|(object_md, mut pqt_reader)| async move {
let Ok(pqt_md) = pqt_reader.get_metadata().await else {
error!("Failed to get parquet metadata from {}", object_md.location);
return None;
};
let file_md = pqt_md.file_metadata();
// Convert file's schema to arrow format and serialize as IPC message
let Ok(arrow_schema) = parquet_to_arrow_schema(file_md.schema_descr(), None) else {
error!("Failed to convert schema for {}", object_md.location);
return None;
};
let Ok(IpcMessage(schema)) =
SchemaAsIpc::new(&arrow_schema, &IpcWriteOptions::default()).try_into()
else {
error!("Failed to serialize schema for {}", object_md.location);
return None;
};
let flight_descriptor = Some(FlightDescriptor {
r#type: DescriptorType::Path.into(),
cmd: Bytes::new(),
path: vec![object_md.location.to_string()],
});
return Some(Ok(FlightInfo {
flight_descriptor,
endpoint: vec![],
total_records: file_md.num_rows(),
total_bytes: object_md.size as i64,
ordered: false,
schema: schema.into(),
}));
});
Ok(Response::new(Box::pin(result) as Self::ListFlightsStream))
} |
Which part is this question about
https://github.com/apache/arrow-rs/blob/master/arrow-flight/examples/server.rs#L48
Describe your question
I've been struggling to understand how to implement something like
list_flights
in a non-trivial way (i.e., with an ObjectStore), but the example isunimplemented
and the example forObjectStore.list
also doesn't give too many clues on how to use it in something likelist_flights
.The issue I'm running into is that I initialize a connection to an ObjectStore and store it in a session context, but I cannot understand how to convert a stream of ObjectMeta from the object store into a steam of FlightInfo as a response to
list_flights
, since the ListFlightsStream has a static lifetime.But I'm getting the error:
Additional context
The text was updated successfully, but these errors were encountered: